Screening, Diagnosis and Prognosis of Autism and Other Developmental Disorders

ABSTRACT

The invention provides a method and system combining functional genomic and genetic, proteomic, anatomic neuroimaging, functional neuroimaging, behavioral and clinical measurements and data analyses for autism pediatric population screening, diagnosis or prognosis. More specifically, the invention provides a weighted gene and feature test for autism which uses a weighted gene signature matrix for comparison to a reference database of healthy and afflicted individuals. The invention also provides normalized gene expression value signatures for comparison to a reference database. The invention additionally combines either the weighted gene or the normalized gene analysis with comparisons to a gene-networks signature matrix, a multi-modal signature matrix, and a collateral features signature matrix for improved accuracy in screening, diagnostic and prognostic relevance for autism, particularly for newborns, babies ages birth to 1 year, toddlers ages 1 to 2 years, toddlers ages 2 to 3 years and young children ages 3 through 4 years.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of PCT Application No.PCT/US2013/052094 filed Jul. 25, 2013, which claims priority to U.S.Provisional Application No. 61/675,928, filed Jul. 26, 2012, the entirecontents of which are incorporated by reference herewith.

STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH

This invention was made with government support under grant Nos.P50-MH081755, R01-MH080134, and R01-MH036840 awarded by NationalInstitute of Mental Health (NIMH). The government has certain rights inthe invention.

FIELD OF THE INVENTION

The invention relates generally to screening, diagnosis and prognosis ofautism and other development disorders. More specifically, the inventionrelates to the use of a combination of functional genomic signatures andmultimodality signatures in screening for autism risk and in autismdiagnostics and prognostics. Its prognostics use includes prediction andcharacterization of likely clinical, neural and treatment progress andoutcome.

BACKGROUND OF THE INVENTION

It is of the greatest importance to improve early screening anddetection of risk for autism, a genetically complex neural developmentaldisorder affecting higher order functions such as social, communication,language and cognition. Among the benefits of early detection is thataccelerating the pace of identification and treatment by even a year¹can have a considerable impact on the outcome of affected newborns,infants, toddlers and young children.

Despite recent university-based research advances in the development ofpotential methods for screening, detection and diagnostic evaluation forautism within the first 2 years of life, the clinical translation ofthese methods into widespread and effective community practice in the UShas not occurred. Instead, 3 to 5 years of age continues to be the ageof first clinical identification and referral for treatment services forautism in much of the US¹. Studies find that on average, a child withautism is diagnostically evaluated by 4 to 5 different professionalsbefore a final diagnosis is determined and this process can take severalyears during which the child does not receive suitable treatment. From aneurobiological perspective, this is particularly problematic given thatfunctional connections in the brain are strongly established during thefirst few years of life^(2,3). Starting treatment after many neuralconnections have already been formed (rather than before) will likelyreduce treatment efficacy and impact. Hundreds of websites, articles,blogs and government, professional and private organizations cite theneed for the early screening, detection, diagnosis and treatmentreferral for children with autism, yet the gulf separatinguniversity-based research advances in early detection and actualcommunity clinical practice is alarming; For example, in 2012 the CDCdocumented the median age of autism identification in the US (based on2008 data) is about 4 years¹. The median age of treatment referral iscorrespondingly even later in the US. Further, there remain largeunderserved segments of the population, both in terms of early screeningand access to empirically-validated early intervention. The magnitude ofthe problem is staggering: Given recent prevalence estimates and theU.S. birth rate, every year 52,000 to 84,000 infants will go on todevelop autism. Thus, there is an enormous and urgent need for usefuland cost-effective pediatric population screening strategies in ordinarycommunity settings throughout the U.S. Presently, unfortunately,hundreds of thousands of toddlers and young children with autism in theU.S. are overlooked, under-treated and may have a poorer outcome thanneed be.

Moreover, once children are identified with having an autism spectrumdisorder (ASD), science has not yet offered insight into prognosis. Willthe child face consistent extreme barriers in speech, language andsocial development, or will he or she fall into the minority of ASDindividuals that enjoy success in school and beyond. Presently, however,there are no prognostic biomarkers of autism; specifically there is alack of prognostic biomarkers that predict and characterize likelyclinical, neural and treatment progress and outcome.

Despite the importance, the high priority of discovery of riskbehavioral or biological markers with clinical impact remains largelyunfulfilled. Neither biological nor behavioral markers have emerged thatfulfill this need in clinical settings for the general pediatricpopulation. For example, commonly used parent report screens (e.g.,M-CHAT, CSBS) have valuable strengths, but also weaknesses⁴⁻⁶, includingvery high false positive rates. The M-CHAT has very low specificity(27%⁵) and positive predictive value (PPV, 11%) when used in the generalpopulation, rendering it of limited utility in routine clinicalpractice. Similarly, the newest and largest study to test the efficacyof the M-CHAT conducted by Chlebowski, Robins, Barton & Fein publishedin 2013 found an 80% false positive rate when the tool was used alone⁸.Although high-risk baby sib studies by Zwaigenbaum⁹ Ozonoff¹⁰, Paul¹¹,Landa^(u) and others have revealed key early deficits such asabnormalities in social attention⁹, they report data only at the grouplevel and have not reported validation statistics such as PPV that are anecessary first step for determining the utility of a behavioral traitas an early marker.

Several groups have used eye tracking and reported reduced preferencefor biological motion²³, fixation to the eye region²⁴, head region²⁵ anddifficulties in joint attention²⁶ as well as scene monitoring duringexplicit dyadic cues²⁷ in ASD relative to TD toddlers. Whilecollectively these studies point to early developmental origins ofsocial dysfunction, reported effects are subtle and results are providedonly at the overall group level and have very weak power to detect ordiagnose ASD. For example, in one study differences in fixation towardsthe face and eye region were no different between ASD and TD toddlerswhen toddlers watched a woman make a sandwich and only became evidentduring a specific 3-second dyadic bid condition²⁷. Moreover, validationstatistics that are needed to translate eye tracking into a screeningtool, such as specificity or positive predictive value, are not providedin most eye tracking studies of ASD toddlers.

While great strides have been made in understanding possible geneticrisk factors¹³⁻¹⁵ and neural bases¹⁶⁻¹⁸ of autism, neither gene norbrain abnormalities published to date have translated into practicalclinical population screens or tests of risk for autism in toddlers.Also, links between genetic and neural developmental abnormalities atyoung ages have remained largely unknown. Overall, research on potentialgenetic and neuroimaging biomarkers has remained largely “in the lab.”

Discovery by one of the present inventors¹⁹ that a substantialpercentage of autism infants and toddlers display early brain overgrowthindicates that autism might involve abnormalities in mechanisms thatregulate cell production or natural apoptosis in early life. Theinventor analyzed dysregulation of genetic mechanism in autism in twoways. First, the total number of neurons in prefrontal cortex tissue inpostmortem autistic boys was counted to reveal a huge 67% excess ofneurons¹⁸. Second, evidence shows that dysregulation of geneticmechanism that govern neuron number in prefrontal cortex brain tissue inpostmortem autistic boys¹⁴.

These discoveries have advanced the general understanding of the neuraland genetic bases of ASD but not the early screening of ASD risk,diagnostic evaluation, and prognostic assessment of autism at the levelof the individual child in the general pediatric population. While otherstudies raise the hope that MRI neuro-imaging biomarkers might beidentified for use with older children or adults already known to haveautism, they have not demonstrated the ability to improve riskassessment at very young ages in the general pediatric population whenthey are most needed. Still other studies suffer from limitations suchas being based only on data from multiplex ASD families^(18,19) leavingunaddressed the majority of autistic infants in the general population,or based on algorithms that identify genes with little or nodemonstrated relevance to the underlying brain maldevelopment inautism^(20,21).

Broadly speaking, “biomarkers” to date (e.g., genetic, molecular,imaging) have poor diagnostic accuracy, specificity and/or sensitivity;none have clinical outcome prognostic power; most are expensive; noneare suitable as an early screening tool in community populations; andfew have undergone serious clinical scrutiny and rigorous validation.For example, genetic findings have been generally non-specific, and thebest characterized CNVs can occur in schizophrenia, bipolar,intellectual disability as well as ASD (e.g., 16p11.2). Few genemutations are recurrent²². CNVs and recurrent genes combined account fora very small, arguably about 5-10%, of all ASD individuals. Thus,current DNA tests detect only rare autism cases and lack specificity.Moreover, genetic tests released by several companies detect only asmall percent (5% to 20%) of ASD individuals, generally lack goodspecificity (because CNV, gene mutation and SNP markers in these testsare also found in a wide variety of non-ASD disorders such asschizophrenia or bipolar as well as in non-symptomatic, “typical”individuals), miss the vast majority of ASD individuals and are veryexpensive and out of the reach of most individuals. A genetic testtargeting baby sibs of older ASD children provides only estimates ofrisk from less to more, but of course, parents who already have a childwith ASD already know subsequent offspring are at risk. The benefit fromthis test is arguably small and of little practical clinical utility. Nogenetic finding has been shown to have clinical outcome prognosticpower; that is, genetic testing does not provide information aboutlikely later language, social or general functional progress andability. A recent MRI “biomarker” works on adults with ASD, butdiagnosis of ASD in adults is of very limited clinical value. A DTIstudy of small samples of infant siblings of older ASD children showsgroup differences too small to hold diagnostic promise. A geneexpression classifier of previously diagnosed ASD 5 to 11 year oldsperformed in a validation set with accuracy, sensitivity and specificityat only 67.7%, 69.2% and 65.9%, respectively²¹. A metabolomicsclassifier tested only a sample of 4 to 6.9 year old children previouslydiagnosed as ASD and did not test newborns or 0 up to 4 year olds.³²

In sum, no currently reported biomarker holds promise as a primary orsecondary early developmental screen or an early diagnostic orprognostic tool in ordinary community pediatric settings at young agesfrom birth through early childhood when these clinical tools are mostneeded. There are no preclinical screens or tests for risk of developingASD with the sensitivity and specificity for routine value in clinicalapplication. Current expectations are that ASD is so etiologically andclinically heterogeneous that no diagnostic biomarker and/or combinationof behavioral or biological markers is likely to do better that detect asmall percentage of cases, and that such biomarkers and/or combinationof behavioral and biological markers will be either sensitive butnon-specific or specific but for a tiny portion of the ASD spectrum.

SUMMARY OF THE INVENTION

The invention provides a leap beyond all current early screening,diagnostic and prognostic biomarker tests for ASD. In certainembodiments, the invention is unique because, among other advantagesprovided, it is the only approach utilizing multimodality (functionalgenomic, genetic, proteomic, anatomic neuroimaging, functionalneuroimaging, and neurobehavioral) data combined with deep clinicalphenotyping data all from the same individual infants and toddlersrepresentative of the general community pediatric population. Usingcomplex bioinformatics methods in novel ways, the invention providesnovel single and multimodality signatures of ASD.

In certain embodiments, the invention is unique in the identification ofgenes, and gene-to-gene interactions (e.g., gene pathways, genenetworks, and hub-gene activity patterns and organization includingquantifiable signature features) in combination with clinical,neuroimaging and behavioral information that have high accuracy,specificity and sensitivity for early screening, diagnostic evaluationand prognostic assessment for autism of subjects including particularlythose at ages from birth to 1 year, 1 year to 2 years, 2 years to 3years, and 3 years to 4 years, and older.

The invention provides highly surprising advantages for multiplereasons: ASD is thought to be highly etiologically and clinicallyheterogeneous, and yet the invention in certain embodiments canaccurately detect the great majority (such as at least 82%) of cases,not just a small percentage of cases (which is the best other ASD riskcurrent biological and behavioral tests can do). There is no provenpreclinical marker of ASD, and yet the invention can detect ASD withhigh accuracy, sensitivity and specificity before clinical symptom onsetin the general natural pediatric population (not just in cases alreadysuspected of being at high risk because of an older sibling with ASD,dysmorphology, seizure, etc). By comparison, existent genetic tests havelow specificity as well as poor sensitivity, detecting only 5% to 20% ofASD cases when tested in general preclinical pediatric populations.Claims of prior art are exaggerated because they are based on testsperformed on patients already highly suspected of being ASD because ofprior clinical testing. The invention has surprisingly high accuracy,sensitivity and specificity in the natural pediatric setting where earlyscreening is a major unfilled need. No prior art has discovered how toutilize clinical and neurobehavior information to differentially adjustgenomic signatures so that they are tuned for the different uses ingeneral population screening, diagnostic evaluation and prognosticassessment.

In certain embodiments, for screening, weighted gene expression patternscan be used alone or in combination with readily available standardclinical measures (head circumference, age, CSBS scores, andGeoPrefernce test score) and do not depend on neuroimaging or othertools unsuited to general population screening, while for thatdiagnostic or prognostic use after a child has become suspected of beingat risk, weighted gene expression patterns can be used in combinationwith specialty tools such as MRI or fMRI to optimize diagnostic andprognostic judgments. No prior screening, diagnostic and prognosticprior art using biological measures is able to accurately classify thegreat majority of ASD cases at such young ages. In sum, no currentlyavailable method matches the present invention for providing acombination of effectiveness across the youngest ages from birth tochildhood; complex algorithmic use of gene weights, patterns andpathways in combination with clinical and neurobehavioral variables;high accuracy, specificity and sensitivity; and flexible utility inautism screening, diagnostic evaluation and prognostic assessment.

In certain embodiments, the invention provides methods of conducting aweighted gene and feature test of autism (WGFTA) for autism screening,diagnosis or prognosis. The method can include a) obtaining an analytefrom a biological sample to obtain analyte-associated gene expressionlevels of a set of at least 20 or more genes selected from a modelderived from an autism reference database, such as disclosed in Tables 1and 2; b) statistically normalizing each expression level of theselected set of genes expressed to derive a normalized gene expressionvalue (NGEV) for each gene in the selected set of the subject; c)preparing a weighted gene signature matrix (WGSM) of the selected geneset; d) calculating a weighted gene expression level of each gene in theselected set by multiplying the NGEV for each gene by a gene-specificweight of that gene. Gene-weights are derived from a computer-basedbioinformatic analysis of the relative expression levels of at least theselected set of genes from the autism reference database including incertain embodiments at least 40 healthy individuals and 40 autisticindividuals compiled in a weighted gene expression reference database(WGERD); and e) establishing the divergence of the set of each weightedgene expression level of the subject to the weighted gene expressionreference database (WGERD), to thereby conduct WGFTA to indicateincreasing correlation with autism risk, diagnosis or prognosis.

Genes that can be tested by the inventive method include those shown inTables 1 and 2 and 16-25 herein. The genes can be selected based ontheir weighted relevance to diagosis or prognosis. These genes involvecell cycle, protein folding, cell adhesion, translation, DNA damageresponse, apoptosis, immune/inflammation functions, signal transductionESR1-nuclear pathway, transcription-mRNA processing, cell cycle meiosis,cell cycle G2-M, cell cycle mitosis, cytoskeleton-spindle microtubule,and cytoskeleton-cytoplasmic microtubule functions. In certainembodiments, genes tested by the inventive method are involved inDNA-damage or mitogenic signaling in brain development.

In certain embodiments, the inventive method can use as few as 20 andinclude about 4000 Autism WSGM genes (including specific splice variantsamong these genes) which may be contained within as few as a single geneset or as many as 8 gene sets and subsets. Different sets and subsetscan be used to optimize performance under different assay andapplication circumstances. In certain embodiments, genes are selectedfrom at least 2, 10, 20, 25, 30, 35, 40, 45, 50, 55, 60, 70, 80, 160,320, 640, 762, or any number in between, for example, from the genes inTable 1. Table 1 represents genes in the present methods for selectionbased on the highest weight ranking which are more frequently associatedwith ASD diagnosis. The genes may be arranged and selected from among 4sets as shown in Table 1, depending upon the commonality of theirexpression patterns. The top 50 genes with absolute value of weightsranging from about 0.50-1.00 in sets 1-4 are also listed in Tables 1.1,1.2, 1.3, and 1.4.

In other embodiments, genes are selected from at least 2, 10, 20, 25,30, 35, 40, 45, 50, 55, 60, 70, 80, 100, 120, 150, 200, 250, 300, 400,500, 600, 700, 800, 900, 1000, 1500, 2000, 2500, or more genes in thegene listings as shown in Tables 16 through 25 provided below. Incertain embodiments, the genes are unique differentially expressed (DE)genes found in ASD and control toddlers. These genes are for instance,dysregulated in DNA-damage response, mitogenic signaling, and cellnumber regulation.

In certain embodiments, normalized gene expression values of thesignature genes (e.g., Tables 1 and 1.1-1.4) can be used as is, thuswithout weighting, for the classification of ASD vs non-ASD. In certainembodiments, using Boosting (see Scoring and Classification methods)three lists of genes were identified with the smallest number ofelements that classified subjects with accuracy of at least 70%, 75%,80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100%. Sets with 20, 25 and 30features that can produce at least 70%, at least 75%, and at least 80%correct classification include but are not limited to those shown inTable 2 below. In certain embodiments, adjusting the weights of thegenes based on the age of the subject is the most important singleparameter for improving accuracy of ASD classification.

The invention claims the use of gene-weights and optionally, featuresignatures as defined below for each of the WGSM genes that, whenapplied to an individual's actual gene expression can accurately predictthat individual's risk for autism in screening or make accurate autismdiagnostic or prognostic classification about that individual. It canalso be used as a diagnostic test for autism for those already known tobe at high risk for autism or suspected to have autism and otherdevelopmental disorders. It can also provide both a diagnosticclassification prediction (autism, not autism) and an estimation ofprobability-risk for autism or other developmental disorders innewborns, infants, toddlers and young children.

The inventive method can further comprise an earlier step of obtainingan analyte in a biological sample, which refers to physically obtainingthe analyte of interest in the biological sample directly from the bodyof a subject or physically moving a sample that has been previouslytaken from the subject. The biological sample can include but is notlimited to, blood, cord blood, serum, plasma, cerebrospinal fluid,urine, tears, saliva, mucous, buccal swab, tooth pulp, skin, neuron, andany other bodily fluid, tissue or organ. The biological sample can alsoinclude cells obtained and/or derived from the biological samples and/orcell culture, including, but not limited to stem cells, fibroblasts,iPs, neuroprogenitor cells, and neural cells. In certain embodiments,the analyte includes, but is not limited to, DNA, RNA, protein, ormetabolite in any biological sample. In certain embodiments, the analyteis blood-derived RNA from leukocytes. In certain embodiments, the WGSMapplies weights gene-wise to an individual's normalized blood-derived(including a newborn's cord blood-derived) gene expression levels.Therefore, to screen and test for autism in newborns for example, theWGSM can be applied gene-wise to the individual's cord blood-derived RNAgene expression levels, and algorithms calculate autism risk. The WGSMis used alone or in combination with the other matrices discussed below.The elements contained in each of the other matrices can also be used aspredictors in the diagnostic classification or prognostic analysis.

The inventive method therefore may also further comprise a comparison ofa gene-network, including hub-gene network, signature matrix (GNSM) ofthe subject to the GNSM autism reference database, to establish a scorefor autism risk screening, diagnosis or prognosis based on thedivergence of the subject's GNSM to the GNSM autism reference database.In certain embodiments, the GNSM comprises interaction patterns ofspecific gene-weights and features calculated from gene-to-geneinteractions, including hub-gene interactions. The interaction patternsare calculated based on the relationship or state of a gene withnon-genomic features.

The inventive method may also comprise a step of comparing a multi-modalsignature matrix (MMSM) of the subject to the MMSM autism referencedatabase, to establish a score for autism risk screening, diagnosis orprognosis based on the divergence of the subject's MMSM to the MMSMautism reference database. In certain embodiments, the MMSM is a matrixcontaining the quantification of non-genomic features obtained byclinical, behavioral, anatomical, and functional measurements. Thenon-genomic features comprise but are not limited to, age, aGeoPreference test score, a MRI/fMRI/DT1 test, an ADOS test, or a CSBStest.

In certain embodiments, the invention is unique in utilizing a testbased on specific age-weighted and age-change patterns and gene-weightsof abnormal gene expression (for instance Weight Sets 1-4 in Table 1) ininfants and toddlers with confirmed autism via longitudinal tracking. Incertain embodiments, the invention provides a method specificallydesigned to leverage age-related gene expression differences betweenautistic and normal individuals in order to indicate probability riskfor autism as it occurs at varying ages in the general pediatricpopulation, making this a unique approach. Therefore, in certainembodiments the invention is a test based on the unique multidimensionalgene and age weighted dataset of autism that is a reference standard fortesting new patients/subjects at risk for autism across ages fromnewborns to young children. Thus, in certain embodiments, it can use ageto transform values of elements in the WGSM and GNSM to improve theaccuracy of tests for ASD based on the unique knowledge of how geneexpression changes with age (e.g., in the first year of life) in ASDsubjects. In certain embodiements, it can use age as a feature inclassification (for example see Scoring/CLASS identity method below).Presented herein is the first evidence of age-related gene expressionchanges in any tissue that correlated with ASD at these early ages. Inpractice, each gene expression element in the WGSM and GNSM will changeby a function of age, with functions ranging from age-independence togain or loss of expression with decreasing age. These age dependentchanges were determined and this information was used to adjust theweighting factors for each gene to age-appropriate weightings to enhancediagnostic performance at the age of individual patients.

Moreover, in some embodiments the invention provides a method furthercomprising a unique step of comparing a collateral feature signaturematrix (CFSM) of the subject to the CFSM autism reference database, toestablish a score for autism risk screening, diagnosis or prognosisbased on the divergence of the subject's CFSM to the CFSM autismreference database. The CFSM comprises features collateral to thesubject, for instance, the collateral features comprise analytes inmaternal blood during pregnancy, sibling with autism, maternal genomicsignature or preconditions, or adverse pre- or perinatal events.

In some embodiments, the invention further provides a method for autismpreclinical screening, diagnosis or prognosis, comprising: a) obtaininga biological sample containing analytes of interest; b) preparing aweighted gene signature matrix (WGSM) comprising expression levels of aselected set of two or more analyte-associated genes selected from thegenes listed in Tables 1-2 and 16-25; c) calculating a weighted geneexpression level of each gene in the selected set by multiplying anormalized gene expression value (NGEV) of the WGSM by the gene-specificweight of that gene provided in Tables 1-2 and 16-25; and d)establishing the divergence of the set of each weighted gene expressionlevel of the subject to a weighted gene expression reference database(WGERD), to thereby indicate increasing correlation with autism risk,diagnosis or prognosis. In certain embodiments, the WGSM is furtherprocessed to reduce dimensionality or computation time and increasepower in the subsequent analysis steps.

In certain embodiments, using functional genomic and biological systemsanalyses, signatures of blood-derived RNA expression are derived fromautism and subjects without autism that are patterns of“gene-specific-weights” (the WGSM) as well as patterns of gene-specificweights as a function of gene-gene interaction patterns (the GNSM),quantifiable features of the individual (e.g., age, sex, headcircumference, neuroimaging measures, eye-tracking score; the MMSM) andcollateral features (e.g., analytes in maternal blood during pregnancy,sibling with autism, adverse pre- or perinatal events; the CFSM). Inessence, these genomic signatures transform the measured gene expressionlevels obtained from an individual through algorithm and knowledge-basedselective application of the derived weighted-patterns that selectivelyenhance or diminish the impact of the measured levels on detection,diagnostic and prognostic classifications and risk estimates. Thenon-genomic feature matrices instead function as predictor variables.

In some embodiments, the invention therefore provides the use of thesefour derived signature matrices unified as the weighted gene andfeatures matrix (WGFM) that is implemented as the weighted gene andfeature tests for autism (WGFTA) for pediatric population screening forrisk of autism and for autism diagnostics and prognostics in newborns,babies, infants, toddlers and young children. For example, itsprognostics uses include prediction and characterization of likelyclinical, neural and treatment progress and outcome. In certainembodiments, the WGFTA uses each in single or in any combination of thefollowing four matrices of the WGFM: The Weighted Gene Signature Matrix(WGSM), The Gene-Networks Signature Matrix (GNSM), The Multi-ModalSignature Matrix (MMSM), and the Collateral Features Signature Matrix(CFSM). In particular embodiments, these signature matrices are designedto optimize, for example, screening for and detection of newborns andbabies at risk for autism, while others are designed for use in theclinical evaluation and diagnostic confirmation of babies, infants,toddlers or young children previously identified as being at risk forautism, and in still others for use in the prognostic evaluation ofprobable clinical course (e.g., worse or improving clinical severity),later clinical outcome (later language, cognitive or social ability), ortreatment response.

In some embodiments, the invention also provides a system for autismscreening, diagnosis or prognosis, comprising a database generated modelof at least two genes and corresponding gene-specific weights asprovided in Tables 1-2 and 16-25, and instructions for use in applyingthe database to a weighted gene signature matrix (WGSM) comprisingexpression levels of a selected set of the same two or more genesexpressed in a biological sample by a) calculating a weighted geneexpression level of each gene in the selected set by multiplying anormalized gene expression value (NGEV) of the WGSM by the gene-specificweight of that gene provided in Tables 1-2 and 16-25; and b)establishing the divergence of the set of each weighted gene expressionlevel of a subject to a weighted gene expression reference database(WGERD), to thereby indicate increasing correlation with autism risk,diagnosis or prognosis.

The invention is currently the only functional genomic test of autismthat is based on direct experimental knowledge of the genetic functionaleffect and neural outcome defects that underlie brain maldevelopment inautism at varying young developmental ages, and the only autism genetictest that detects a majority of autism individuals. The invention isplatform independent, and has been tested and validated on independentcohorts of patients and by using different methods.

BRIEF DESCRIPTION OF THE DRAWINGS

FIGS. 1A-1C. Gene Networks are associated with neuroanatomic measuresvariation and distinguish ASD from control toddlers. FIG. 1A) TotalBrain Volume (TBV) measure distributions in ASD and control toddlers.T-test showed no statistically significant difference in the twodistributions (pValue=0.645). FIG. 1B) WGCNA analysis across all ASD andcontrol subjects together (combined analysis) identified seven modulesof co-expressed genes that are associated with neauroanatomic measures(see also Table 5). The bar graph displays the enrichment scores of theseven modules using Metacore pathway analysis. FIG. 1C) The eigengenevalues of the same seven modules were used in a correlation analysiswith the neuroanatomic measures (see Table 6). Overall six of the sevenmodules (gene networks) display statistically significant associationwith the neuroanatomic measures, but the association was differentwithin each group. The scatter plots provide a graphical representationof the relationship between module eigengenes (gene expression variance)and total brain volume variation in the ASD (light grey) and control(dark grey) groups. The most evident differences between the two groupsaccount for gene patterns in the cell cycle, protein folding and celladhesion modules. Additional differences are found in the cytoskeleton,inflammation and translation gene modules. High expression levels ofcell cycle and protein folding genes are found in normally small brains,while the other gene networks seem to have a weaker effect in keepingthe brain from growing in size. Conversely, the combination of reductionin cell cycle and protein folding genes together with variations in geneexpression levels in the other functional networks are found to drivepathological brain enlargement in ASD.

FIGS. 2A-2C. WGCNA analysis of the combined dataset (ASD and controltogether) defined which modules are associated with Total Brain Volume(TBV) measures in control toddlers and which modules in ASD toddlers.The impact of gene expression on brain size variation is calculated asGene Significance (GS) for each TBV-associated module within control(dark grey) and ASD toddlers (light grey). The bar graphs show thedifference in GS between the two groups. Negative GS values reflect theopposite relationship between eigengenes and TBV variation (see Table6), thus high gene expression levels associated with small brain andvice versa. Solid bars with an asterisk indicate that the association isstatistically significant. Empty bars without the asterisk on topindicate that the association is not significant. The correlationbetween the GS and Gene Connectivity (GC; defining hub-genes) for eachgene within a module displays the change in activity patterns and impacton brain size variation of hub genes (left scatterplot for each module).The correlation between GS and Module Membership (MM; specificity of agene to the assigned module) display consistent activity pattern changesrelative to hub-genes alterations (right scatterplot for each module).The analysis of the top 30 genes for the three network features (GS, GC,MM) displayed that GS was the feature with the highest number of alteredgenes in each module. The module enriched in translation was overall theone with the highest number of genes that changed between ASD andcontrol toddlers. FIG. 2A) Genes involving cell cycle and proteinfolding. FIG. 2B) Genes involving cell adhesion and cytoskeleton. 2C)Genes involving translation and inflammation.

FIG. 3. Co-expression modules generated from the WGCNA analysis ofcontrol and ASD samples separately. The absolute values of GS forcontrol-based modules (left) are consistent with the modules from thecombined analysis within the control group (FIGS. 2A-2C). The absolutevalues of GS for ASD-based modules (right) are consistent with thecombined analysis within the ASD group (FIGS. 2A-2C) and displayed anincrease in the number of modules associated with TBV measures. Thedifferences in modules associated with TBV measure in the separate WGCNAanalysis are hence accentuated.

FIGS. 4A-4E. Pathway-based Replication analysis of differentiallyexpressed (DE) genes. Module-based classifier efficiently distinguishesASD from control subjects and displays a high protein-proteininteractions (PPI) enriched in translation genes. FIG. 4A) Pathwayenrichment comparison in Metacore between the Discovery and ReplicationDE genes. DNA-damage and Mitogenic signaling share the strongestsimilarity. FIG. 4B) Pathway enrichment analysis of the commonlydysregulated genes in both Discovery and Replication samples. FIG. 4C)Left panel, ROC curves and AUC values from the classification ofDiscovery (ROC 1) and Replication (ROC 2) subjects. Right panel, ROCcurves and AUC values from the classification of all subjects in thedifferent diagnostic categories. ROC 3=ASD vs typically developing (TD)toddlers (thus excluding contrast subjects); ROC 4=ASD vs contrasttoddlers; ROC 5=contrast vs TD toddlers. FIG. 4D) Coordinates extractedfrom all ROC curves in panel C. FIG. 4E) Cytoscape visualization withthe PanGIA module style using the genes from the four modules withdirect PPI (DAPPLE database). The number of interactions is correlatedwith the color and position within the network. White indicates <8 PPI;yellow to red indicates 8≦PPI<31. The core of the network, representedby the genes with the highest number of interactions, is enriched withtranslation genes.

FIG. 5 WGCNA analysis across ASD and control toddlers. Co-expressionmodules are generated and color-coded (here showed in grey scale). Eachvertical line corresponds to a gene, and genes with similar expressionare clustered into modules. Modules are herein called by the assignedWGCNA default colors. Module eigengenes are computed for each subjectand each module.

FIGS. 6A-6B Correlation analysis between modules and neuroanatomicmeasures using WGCNA on all discovery subjects. pValues are inparentheses. Dx=diagnosis, L=Left, R=Right, CB=Cerebrum,CBLL=Cerebellum, GM=Gray Matter, WM=White Matter, TBV=Total BrainVolume, hemi=hemisphere, SA=Surface Area, BS=Brain Stem. FIG. 6A)MEDARKRED-MESALMON. FIG. 6B) METAN-MEGREY.

FIG. 7 Gene Significance (GS) to Gene Connectivity (GC) correlationwithin each module in the ASD and control groups. 12 of the 22co-expressed modules across groups displayed a severe change in patterndirection (negative to positive or not significant correlation), while 4modules had a modest change in correlation (same direction).

FIGS. 8A-8B Association analysis between modules and neuroanatomicmeasures using WGCNA on control toddlers. L=Left, R=Right, CB=Cerebrum,CBLL=Cerebellum, GM=Gray Matter, WM=White Matter, TBV=Total BrainVolume, hemi=hemisphere, SA=Surface Area, BS=Brain Stem. FIG. 8A)MEMAGENTA-MEGREEN YELLOW. FIG. 8B) MEGREY60-MEGREY.

FIGS. 9A-9B Association analysis between modules and neuroanatomicmeasure using WGCNA on ASD toddlers. L=Left, R=Right, CB=Cerebrum,CBLL=Cerebellum, GM=Gay Matter, WM=White Matter, TBV=Total Brain Volume,hemi=hemisphere, SA=Surface Area, BS=Brain Stem. FIG. 9A) MELIGHTGREEN-MEDARKRED. FIG. 9B) MEGREEN YELLOW-MEGREY.

FIG. 10 WGCNA analysis across ASD and control toddlers using thedifferentially expressed genes. Co-expression modules are generated andcolor-coded (here showed in grey scale). Each vertical line correspondsto a gene, and genes with similar patterns are clustered into modules.Modules are herein called by the assigned WGCNA default colors. Moduleeigengenes are computed for each subject and each module.

FIG. 11 Plot of classifier prediction performance relative to subject'sage. Distribution of subject age separated by the accuracy of theclassifier.

FIG. 12 Plots of the prediction performance and age-corrected totalbrain volume (TBV), whole cerebrum and cerebellum measures.

FIG. 13A-13C. Age- and diagnosis-related gene expression profiles. FIG.13A) example of change in gene expression with a main effect ofdiagnosis (ASD in light grey vs Control in dark grey). FIG. 13B) exampleof change in gene expression with main effects of age and diagnosis.FIG. 13C) example of change in gene expression with interaction betweenage and diagnosis.

FIGS. 14A-14B. Inclusion of age in the classification analysis usingBoosting. FIG. 14A) Graphical representation of the classificationoutcome in the training set (continuous line) and after cross-validation(dotted line) with age as additional predictor. FIG. 14B) Graphicalrepresentation of the classification outcome without age as predictor.When using age as additional predictor the cross-validation errordiminish from about 0.3 (30%) to about 0.2 (20%), thus suggesting thatage is helpful in improving classification accuracy.

FIG. 15. Diagram representing the splits of decision tree classification(left panel) for ASD (+1) and control (−1) and the feature space that isrecursively divided into finer sub-regions accordingly to the number offeature used (right panel).

FIG. 16. Diagram representing the boosting algorithm (for exampleAdaBoost) by fitting a baseline classifier and using its performance onthe training data to re-weight the importance of each point insubsequent fits.

FIG. 17. Boosting classification performance using 25 genes of thesignature matrix. The cross-validation error is about 25%, thus giving aclassification accuracy of 75%.

DETAILED DESCRIPTION OF THE INVENTION

Various publications, including patents, published applications,technical articles and scholarly articles are cited throughout thespecification. Each of these cited publications is incorporated byreference herein, in its entirety.

Throughout this specification, the word “comprise” or variations such as“comprises” or “comprising” will be understood to imply the inclusion ofa stated integer (or components) or group of integers (or components),but not the exclusion of any other integer (or components) or group ofintegers (or components).

The singular forms “a,” “an,” and “the” include the plurals unless thecontext clearly dictates otherwise.

The term “including” is used to mean “including but not limited to.”“Including” and “including but not limited to” are used interchangeably.

In some embodiments, the invention provides a use of functional genomicsignatures in combination with functional genomic-based multimodalitysignatures in screening for autism risk and in autism diagnostics andprognostics. The multimodality signatures include, but are not limitedto, physical, neurobehavioral, neuroimaging, neurophysiological,clinical history, genetic, maternal precondition, parent questionnaire,family history and behavioral and psychometric test information, andderived from bioinformatic and biological systems analyses of analytescollected in vivo from peripheral tissues including cord blood, blood,skin and urine. The invention specifically provides the use of varyingforms of such signatures each tailored to optimize autism screening,diagnostics and prognostics according to the individual's age, sex,ethnicity, and clinical and family history, thus, providing pediatricpopulation screening biomarkers of risk for autism and diagnostic andprognostic biomarkers of autism (i.e., Autism Spectrum Disorders or ASDas defined in DSM V and broadly characterized in DSM-IV-TR) and risk forautism in individuals at young ages, including newborns, babies,infants, toddlers and young children. Prognostic biomarkers as usedherein include those that predict and characterize likely clinical,neural and treatment progress and outcome.

In certain embodiments, the invention can test for risk of autism in anynewborn, infant, toddler or young child. The functional genomic andfunctional genomic-based multimodal signatures presented here, developedfrom general pediatric populations at young ages, have far betteraccuracy, specificity and sensitivity than any previously developedbiological- or behavior-based screen or early classifier in ASD newbornsand 0 to 1 year olds 1 to 2 year olds, 2 to 3 year olds and 3 to 4 yearolds. In particular embodiments, the invention provides computer-basedbioinformatics analyses that have derived genomic and genomic-basedmultimodal signatures in vivo that efficiently predict autism at veryyoung ages.

Because autism is a strongly genetic disorder of neural development, amajor breakthrough in risk assessment of autism would be the ability toidentify functional genomic defects that relate to and may underliebrain development in autism at the youngest ages possible. From suchgene-brain knowledge, better and more autism-relevant biomarkers ofearly risk should be obtainable. Therefore, in some embodiments theinvention provides unique analyses not performed previously by any otherresearchers in the autism field that identified functional genomicdefects in blood leukocyte mRNA that are strongly correlated with brainand cerebral cortex developmental size in very young autistic subjects.In certain embodiments, the invention provides that among the genes soinvolved, a large percentage of them are also abnormally dysregulated ascompared to the typically developing control infants and toddlers. Thisresult is the first identification of a functional genomic pathology inthe first years of life in autism. Using bioinformatics and systemsbiology analyses, the invention provides functional genomic andfunctional genomic-based multimodality signatures (the weighted gene andfeatures matrix) for autism screening, diagnosis and prognosis, which isused in the invention of the weighted gene and feature tests of autism(WGFTA).

The WGFTA of the invention detects, quantifies risk and classifiesautism, and other developmental disorders at the youngest ages in thegeneral pediatric population with greater accuracy, specificity,sensitivity and positive predictive value than any other publishedmethod. These are the first clinically-relevant, braindevelopment-relevant and practical genomic signatures of risk for autismin newborns, infants, toddlers and young children. This set ofsignatures detects subtypes of autism with more severe as well as lesssevere involvement. As such, the WGFTA impacts identification of thosewith more severe neuropathology and reveals differential prognosis.Moreover, repeat testing with the WGFTA enables tracking andunderstanding longitudinal changes in autism neural and clinicalpathology across development in autism. Not only does the invention ofWGFTA set of tests have substantial clinical impact at the level of theindividual child—a first in the autism field, but the invention alsoimpacts studies linking genetic and non-genetic etiological variables inthis disorder.

More detailed descriptions of the invention of WGFTA and associatedsignature matrices are provided below. In certain embodiments, theinvention provides the weighted gene feature tests of autism (WGFTA),which is the application of each single or any combination of thefollowing matrices, unified under the name “Weighted Gene and FeaturesMatrix” (“WGFM”):

The weighted gene signature matrix (WGSM) is a matrix containing sets ofgenes and gene-weights, which constitutes a model of referenced dataset.In certain embodiments, gene weights are derived from a computationalbioinformatics analysis of the relative expression levels of at leastthe selected set of genes from more than 40 healthy individuals and 40autistic individuals compiled in a weighted gene expression referencedatabase (WGERD). In certain embodiments, the invention provides a WGSMand/or WGERD with at least 2, 10, 20, 25, 30, 35, 40, 50, 60, 70, 80,160, 320, 640, 762, 800, 900, 1,000, 1,500, 2,000, 2,500 or more genes,or any number of genes and their respective weights determined asdescribed and exemplified herein. The genes can be arranged into sets ofcommon expression patterns, as an example, 4 sets are shown in Table 1.In certain embodiments, the referenced database WGERD of the inventionis designed to be constantly updated with new subjects and additionalfeatures (e.g., sequencing data) so that the genes and gene weights, aswell as non-genomic features can be updated accordingly.

The weights of genes provided in Table 1 can be rounded to the nearest1/10; 1/100; 1/1,000; 1/10,000; 1/100,000; 1/1,000,000; 1/10,000,000; or1/100,000,000. The genes are provided in ranked order of their weightedcorrelation as provided in Tables 1.1 through 1.4.

One computer-based bioinformatic algorithm used to determine theweighted values is part of the Weighted Gene Co-expression NetworkAnalysis (WGCNA) package in R computer environment(cran.us.r-project.org/). The use of this package is also described inexample 1 and 2 methods (see below). Quantification of gene expressionlevels and therefore weight calculations are platform and methodindependent. Microarray-based platforms (for instance, Affymetrix,Illumina Nimblegen chips), sequencing-based reactions (for instanceIllumina or Roche next-generation seq or traditional Sanger seq) and anyother quantitative approaches (for instance qPCR-based such as theFluidigm system) can be used to determine nominal gene expression levelsand with any of the weight calculation methods described herein. Usingrecommended settings in the WGCNA package, cleaned and normalized geneexpression data is clustered into gene sets (herein called Modules)based on similarity of co-expression. Genes with similar expressionpatterns across subjects are assigned to a specific module. For eachmodule and each subject an eigengene is then calculated. Calculation ofthe eigengene values is done via the computer formula “moduleEigengenes(data)” where “data” is the variable containing the gene expressionvalues of all subjects. This step is equivalent to the conventionalprincipal component analysis in which the variance of amulti-dimensional dataset (many genes) is represented by one value(component 1 or eigengene value). The weights are then calculated byusing the “cor( )” formula in R with “data” and “eigengenes” asarguments. This function performs correlation analysis between a moduleeigengene value and the expression value of each gene within the samemodule. Correlations are performed for all genes in a module and for allmodules. Using this method, weights values range from −1 to 1, andrepresent the contribution of genes to the overall gene expressionvariance of each particular module. Genes with weights values closer to−1 and 1 have the highest contribution, thus importance. Weights arecalculated also using other analogous data-reduction methods that may ormay not include a priori clustering steps as to the case of WGCNA (basedon co-expression). Examples are Principal Component Analysis (PCA),Multi-Dimensional Scaling (MDS), and Independent Component Analysis(ICA). In these examples, weights are commonly referred as “Loadings”.Weights calculation is extended also to the use biological information,such as protein-protein interactions (PPIs), gene-to-gene interactions(GGIs), Gene Ontology (GO) information, and network or ranking position;therefore both statistical and biological-based methods can be appliedto derive weights/loadings from gene expression data.

The present invention provides, for the first time, the use of weightsin the screening and diagnosis of autistic subjects, especially at youngages (birth to age 4 years). Autism involves disrupted hub genes andgene pathways, sub-networks, networks and modules (see EXAMPLE 1), andthe patterns of less to more abnormal gene expression within thesesystems is encoded and used for autism screening and diagnostics in theinvention. In some embodiments, this is done via PPIs and/or GGIs asjust stated and such pattern information is in the GNSM. In otherembodiments, the patterns of less to more abnormal gene expression areencoded by gene weights. This gene-weighting improves performance, andcan be used in combination with classifying genes into modules orindependently of modules. Similarly, clustering the genes into modulescan be used alone or in combination with GNSM, MMSM, and CFSM.

The study of the unique reference dataset of ASD and control infants andtoddlers provided the unique opportunity to discover importance levelsof genes (from low to high priority) for the identification of autismrisk. Based on the biological information present in our referencedataset, genes with higher priority have a higher importance incorrectly classifying ASD patients. Priority was assigned based on theweight value calculated for each gene. As described above, genes withweights values closer to −1 and 1 have the highest contribution (and,thus importance). Therefore, gene lists can be selected based on theweight values. For example, in some embodiments, a gene list can beselected from genes with an absolute weight value of 0.15 to 0.4, from0.5 to 0.7, or above 0.8. In certain embodiments, a gene list can begenerated by selecting genes with an absolute weight value of above0.15, above 0.20, above 0.25, above 0.30, above 0.35, above 0.40, above0.45, above 0.50, above 0.55, above 0.60, above 0.65, above 0.70, above0.75, above 0.80, above 0.85, above 0.90, above 0.91, above 0.92, above0.93, above 0.94, above 0.95, above 0.96, above 0.97, above 0.98, orabove 0.99. The genes can be selected with or without using clusteringto define particular modules before applying the weighting. The four“Top 50” gene sets show weights ranging from approximately 0.53 to 0.98for the top 50 genes of four different modules. Alternatively, theabsolute weights can be used as a threshold (with or without clustering)to determine a number of genes having a weight above, for example, anyof the absolute weights listed above.

TABLE 1 GeneID_set 1 Weights Set 1 GeneID_set 1 Weights Set 1 GeneID_set2 Weights Set 2 GeneID_set 3 Weights Set 3 GeneID_set 4 Autism Dx Set 4CD3D 0.935783593 C2orf3 0.643325498 CUTL1 0.899346773 LOC449260.952217397 SDPR 0.982476 UXT 0.899396969 PDCL 0.641692741 MAST30.891222232 ITM2B 0.918889397 PDE5A 0.936387 RPS4X 0.897584727 ZNF5440.641399133 STK4 0.858164628 HOXC6 0.899386247 PTGS1 0.885464 LOC2834120.893348386 PAK1IP1 0.641392622 KIAA247 0.842925274 LOC3922880.895514994 CTDSPL 0.883295 LOC127295 0.891939246 LOC4455 0.639352929MYH9 0.829782866 YIPF4 0.891398192 CTTN 0.88314 LOC42694 0.891528532PAQR8 0.639332216 RAPGEF2 0.817284111 RBMS1 0.888876658 ALOX12 0.872153SKAP1 0.891256553 TMEM5B 0.638786733 ARAP3 0.815255748 USP6 0.871882682MPL 0.867286 LOC72882 0.889935118 C22orf32 0.636925896 RAB11FIP10.797221635 KIAA133 0.868831295 DNM3 0.856197 LOC645173 0.888169553CXCR7 0.63651849 WBP2 0.796352118 LOC642567 0.867721395 C1orf47 0.848722RPL23A 0.887966784 RTBDN 0.635348948 GNAI2 0.795553329 EVI2B 0.865616513C7orf41 0.828971 LOC646942 0.883945464 EEF1G 0.634822364 MTMR30.795286983 UBE2W 0.851386357 C5orf4 0.827413 LOC646294 0.881255183RPL37 0.633756743 CBL 0.792889746 DDX3X 0.849399278 RAB27B 0.815966LOC728428 0.881254479 KIAA355 0.63224944 UBE4B 0.792663385 UBE2D10.844426485 CXorf2 0.811991 LOC44737 0.871745735 MRPS27 0.631945943IGF2R 0.791413796 HIAT1 0.841479672 GRAP2 0.797727 LOC7329 0.871551112SSR4 0.629113287 YPEL3 0.789244232 TTRAP 0.837658199 CDC14B 0.782988LOC391833 0.868862845 TOMM7 0.628843995 SETD1B 0.784239228 LOC445250.83588163 DAB2 0.771423 RPS3 0.867774287 LOC1131672 0.628653337 PIK3CD0.782776396 C18orf32 0.828614134 TAL1 0.755586 RPL36 0.864561634 KRT730.628159669 RASSF2 0.775818951 LOC1132888 0.826453522 NCALD 0.747679LOC1127993 0.86394855 POLR1D 0.625815554 KDM6B 0.774456479 ROCK10.821534228 ITGB5 0.74494 LOC73187 0.862255251 INPP4B 0.625699386TP53INP2 0.769228192 LOC64798 0.818875634 GUCY1A3 0.732784 LOC728310.861398415 ALKBH7 0.623546371 NUAK2 0.764937137 FAM91A2 0.8151116FERMT3 0.725864 LOC653162 0.857976375 AKR7A3 0.622561242 PAK20.764551187 SENP6 0.81341925 TSC22D1 0.725234 LOC729679 0.856634917OGFOD1 0.622236454 MYO9B 0.758257515 LOC732229 0.812945123 LIMS10.722976 LOC441246 0.856563215 COX7A2L 0.622161758 NDE1 0.757911755CEP63 0.812863172 SLC8A3 0.721372 LOC387841 0.853626697 SNORD160.619846554 IRS2 0.748758318 ATG3 0.811914569 ABCC3 0.716486 C13orf150.853475792 PRKCA 0.619788175 PHF2 0.747211227 LOC1128269 0.79772624HOMER2 0.713716 LOC728576 0.852817639 MAN1C1 0.617174926 MAP2K40.746288868 PLAGL1 0.79624413 NAT8B 0.712372 EIF3K 0.851382429 COX110.617173924 CAMK1D 0.743845616 MBD2 0.794667574 FBLN1 0.695683 EEF1B20.848139827 EDAR 0.616832496 CDC2L6 0.739975446 EXOC8 0.789627347ARHGAP21 0.688976 LCK 0.847839497 SMYD2 0.615536584 ASAP1 0.734296313MRRF 0.788483797 C21orf7 0.688378 LOC39345 0.846855595 C2orf1960.615182754 TSC22D3 0.729781326 LOC113377 0.785194167 C15orf52 0.687782RPL4 0.846548149 ACYP2 0.61357462 TLN1 0.728642978 POTE2 0.784824825CABP5 0.682826 LOC1132742 0.842833281 GCET2 0.613272559 ANXA110.727162993 C8orf33 0.783663596 ENDOD1 0.663152 EIF3H 0.842637899SNORD13 0.612937581 EP3 0.726274852 LOC38953 0.78121382 SOCS4 0.66279CD27 0.842195295 C1orf14 0.612368536 ROD1 0.725977368 CPEB3 0.773273834C15orf26 0.644173 RPS15 0.841763438 LOC647276 0.611266653 RXRA0.725773276 C6orf211 0.769958713 PVALB 0.638495 LOC649447 0.839379287PLEKHF1 0.598224876 RASSF5 0.721366824 LOC1128533 0.769836835 SLC24A30.637579 LOC1131713 0.838956452 FKBP14 0.598185865 PELI2 0.719737185LOC648863 0.769661334 HGD 0.635255 LOC286444 0.837678151 FOXO10.597245265 SEMA4D 0.717738781 STX7 0.767879114 ZNF185 0.628879LOC729789 0.837517553 LOC339352 0.594815425 PPM1A 0.716812485 14-Sep0.764666259 CA2 0.624763 RPL1A 0.828892451 ZNF395 0.594235542 CREBBP0.716647465 LOC4493 0.763746743 CXCL5 0.618479 CD6 0.826562257 DSTN0.592351455 LAPTM5 0.716353697 LOC442319 0.763346856 GRB14 0.617611LOC646766 0.824653268 RPS29 0.591862228 CABIN1 0.715925162 NCRNA810.751898522 VWF 0.611157 C17orf45 0.823936864 SNORD21 0.591444476 PLCB20.715345575 CLEC7A 0.744233541 DKFZp686I15217 0.599262 CUTA 0.823637548LOC64663 0.589558195 WNK1 0.711353632 CSNK1A1L 0.733761775 NDUFS10.593178 EIF3F 0.82286832 TBCA 0.588271553 BCORL1 0.698292888 LOC6438960.731569432 GRASP 0.581414 LOC642741 0.822312667 PLAG1 0.586638621 SIK30.697558261 P74P 0.725849778 RGS18 0.572236 LOC388339 0.821936639 TTC39C0.585818195 SLC44A2 0.696529915 GABARAPL2 0.723517197 C16orf68 0.562993RPS14 0.821669767 ZNF16 0.585192385 EPOR 0.692878472 FCGR3A 0.717268353MGC135 0.552543 LOC11398 0.818423753 LOC645233 0.584896119 SP20.686587522 LOC65638 0.714919235 LOC64926 0.548172 LOC643531 0.818367581CENPL 0.58453599 IP6K1 0.686339387 FAM126B 0.713524823 HIST1H2AE 0.53314LOC642357 0.815418616 XYLT2 0.583954832 LPIN2 0.686253547 TOP1P20.711774489 TCEA3 0.472277 LOC4455 0.815254917 TSPAN5 0.581829618 TGFBR20.681731345 TFEC 0.697721596 MEIS1 0.453958 RPS5 0.81396843 LOC44640.578616713 MYST3 0.67599148 HERPUD2 0.692843953 MSRB3 0.448888 PIK3IP10.812422946 HABP4 0.578161674 MID1IP1 0.675927736 RPAP3 0.689713938DNHD2 0.448113 RPL5 0.799548493 NHP2 0.577712263 AHCTF1 0.675368429LOC644964 0.688291553 IRX3 0.396578 FLT3LG 0.798617496 SELM 0.571396694CHES1 0.675156518 LOC391769 0.673227357 SPG21 0.389869 ATXN7L3B0.798521571 DCXR 0.56883363 MAP1LC3A 0.673939379 BRD7P2 0.664481299SPC25 0.374118 DKFZp761P423 0.797275569 PHB 0.56679772 KDM5B 0.673634194ANP32A 0.662291765 POLR1E 0.795479112 CD32 0.565674142 ZYG11B0.673297864 LOC641992 0.647881441 C2orf89 0.794392985 DLEU1 0.564859273POLR2A 0.665496976 PAPSS2 0.637828688 C11orf2 0.793512166 DUSP140.562495337 AKT1 0.663541972 LOC1128627 0.637538136 FAM1A4 0.793287257MSX2P1 0.559554447 TBL1X 0.662364885 KRT8P9 0.63712914 LDHB 0.791745887RNF144A 0.559297465 IMPA2 0.65781512 TMX4 0.612694353 LOC731960.791625893 AHCY 0.558954772 ATG2A 0.654245217 LOC64552 0.59882446LOC44927 0.789154863 FAM134B 0.558375382 MAPKAPK2 0.653979578 LOC3892860.596113393 TNFRSF25 0.786317228 TYSND1 0.556848766 FAM11B 0.649818256CWC22 0.592755277 ZNF329 0.782992446 LOC728953 0.554168254 CENTB20.648915988 SH3BP2 0.55771393 LOC644464 0.779129219 LOC3877910.551536874 RFX1 0.648867183 LAPTM4A 0.551788533 RAB33A 0.776193173SELPLG 0.549789855 SPI1 0.642942512 SYTL2 0.499767546 RPL22 0.775782518KLRB1 0.548467766 ZNF281 0.641915681 ANP32C 0.378151333 LOC3885640.774155475 ATP5E 0.547577933 USP9X 0.641791596 LOC1134291 0.277985424C6orf48 0.772942779 TCP1 0.547495293 DPEP2 0.641158453 LARP1−0.312422458 DDHD2 0.772697886 ZDHHC9 0.544612934 PACS1 0.636214668C18orf1 −0.315553843 PKIA 0.771777911 CCDC72 0.543531769 GATAD2B0.631961987 TCEAL4 −0.394177985 C11orf1 0.77146654 RNF144 0.543479417MGC42367 0.631548612 SDHAF1 −0.415292518 RWDD1 0.769315667 MARCKSL10.543422113 PJA2 0.629172534 CCDC9A −0.418249658 LOC389342 0.769266259GPX4 0.541737879 BRD3 0.628793665 ODC1 −0.488451364 CA5B 0.768742497VSIG1 0.539617567 KIDINS22 0.622713163 ARHGAP1 −0.495153647 DAP30.765349952 DHRS3 0.538953789 FAM12A 0.59691644 TADA1L −0.517862143ATPGD1 0.765166323 CNNM3 0.537386642 RAB11FIP4 0.596547435 LOC92249−0.579379826 C12orf65 0.764854517 FBLN2 0.535467587 OSBPL8 0.593855675CD99 −0.59333825 ATP5A1 0.7645682 ELOVL4 0.535114973 CCNK 0.592217195HCST −0.625513721 IL27RA 0.763477657 PRRT3 0.534237637 SGK 0.588593659TRAPPC4 −0.643976448 ORC5L 0.762996289 VHL 0.532395335 PCBP2 0.586773694EIF2AK1 −0.644486837 MFNG 0.761418624 HNRNPU 0.531745499 SNORA280.584584438 CS −0.653859524 APOA1BP 0.759114222 FCGBP 0.527263632C14orf43 0.573927549 LOC1128731 −0.654961437 USP47 0.758717998 GOLPH3L0.527213868 ELMO1 0.571788753 ILVBL −0.655857192 PEX11B 0.754628868LMNB2 0.524692549 TMCC1 0.566173385 SETD1A −0.662596368 CRBN 0.754152497CCT3 0.524567526 DGCR8 0.564982984 LOC4948 −0.724984343 C12orf290.753564787 CRIP1 0.52227375 NCOR2 0.563615666 TTC4 0.752585135 ZFP30.517155756 UBAP2L 0.558982967 C1QBP 0.752379867 PEBP1 0.515338931 PRKCB0.556183699 LOC728128 0.751472664 9-Sep 0.514442369 SEC16A 0.555783769GDF11 0.74939769 TSTD1 0.51172194 C13orf18 0.555593833 C16orf530.748642633 SNHG9 0.498816845 HNRPUL1 0.54417842 LOC347292 0.748154744NDUFAF3 0.493661179 LASP1 0.543199946 EIF3L 0.747991338 ACOT40.493494423 SF3A1 0.537427512 QARS 0.746682333 LIAS 0.493133496 HELZ0.532982164 TCEAL8 0.738139918 ST6GALNAC4 0.492572367 ABAT 0.532615683LOC4963 0.737889313 C1orf35 0.491497922 PRKCB1 0.531452289 LOC258450.73723136 KIAA143 0.489514968 NCF1B 0.528432749 SMYD3 0.734452589TIMM22 0.489238613 CUGBP2 0.526196965 MGC87895 0.733843872 TMEM1160.489235392 ANGPT1 0.523883946 SEC62 0.733293263 DBP 0.488445622 MAPRE30.522517685 PRAGMIN 0.731919211 TMEM17 0.487266629 DAPK2 0.521285458LOC73246 0.731324172 C22orf29 0.485679679 NLRX1 0.518491497 ABHD14A0.729919691 WDR82 0.47897466 GATAD2A 0.515499364 LOC729279 0.729691569C2orf15 0.477568944 NR4A2 0.514797225 RAPGEF6 0.729549364 AK50.476192334 JARID2 0.514354883 C19orf53 0.728514239 AKTIP 0.474998212GATS 0.499114393 LOC44113 0.728397238 ZBED3 0.474981147 ARID4A0.492115532 HSPB1 0.726571375 SH3PXD2A 0.46973856 CHPF2 0.489985486 GPN10.726566569 NENF 0.468411812 EPN2 0.488233339 SLC25A3 0.726435744 TGIF10.467594838 TMEM33 0.482944342 POLR2G 0.726261788 ZNF559 0.465617668AGAP8 0.477844348 SUMF2 0.726147864 MMGT1 0.461621563 ATP2B4 0.477783292GLTSCR2 0.725737593 ZNF252 0.458778973 DIAPH1 0.471135458 LOC64730.724122215 PRUNE 0.457978567 METTL9 0.469936938 FBXO32 0.722432538LOC646836 0.457692454 HSPA1L 0.469749354 TSGA14 0.719854653 LDOC1L0.457631285 LOC113383 0.468335258 MDH2 0.718886435 CRIP2 0.455459825KBTBD11 0.46717114 RPS8 0.716652755 ARRDC2 0.453694329 BRPF3 0.465177555SEPW1 0.716486338 AP2S1 0.452824193 UBE3B 0.461616795 FAM3A 0.715548165LRRC16A 0.442652223 CD3LB 0.4557268 MAL 0.71483775 CDC42SE2 0.439226622PAN3 0.455644279 EIF3G 0.713911847 LARGE 0.433862128 TACC1 0.451198563LOC653737 0.713386474 LOC642755 0.429488267 RAB43 0.449652328 LOC11294240.713323277 LOC729985 0.427293919 CLASP1 0.447777232 PLCG1 0.712268761SERPINE2 0.426932499 FLJ1916 0.445564612 TMEM23 0.711859266 LOC11282520.425642295 PDPK1 0.444597997 LYRM7 0.711826946 LOC64634 0.422889942FAM65B 0.44427143 COMMD7 0.711479625 RTKN2 0.421221582 ARID1A0.442712377 TECR 0.711389973 ZFP14 0.413753175 DACH1 0.439734865 C16orf30.696214995 DECR2 0.392967984 SREBF1 0.429394886 PECI 0.694839698 ZNF240.39244688 SRRM2 0.423932667 LOC646688 0.694445665 HPCAL4 0.392296965ZFYVE27 0.421452588 C1orf151 0.691589944 NT5DC3 0.385937142 TAF40.418321979 LOC72942 0.689981631 SNORD18C 0.377829771 RNF13 0.417795315BTBD2 0.689831116 C19orf39 0.377367974 ZNF644 0.4159187 LOC6455150.6893317 CNN3 0.374713277 CCDC97 0.399889593 SMPD1 0.688971964 PDZD40.371554119 MED31 0.392396434 PPP1R2 0.688489262 LOC652837 0.364795947NCRNA85 0.382898142 NMT2 0.688136554 KIAA226 0.361262176 ANKRD120.382668594 PPM1K 0.687738718 C2orf1 0.354822868 LOC64235 0.382215946LOC731365 0.686367864 C3orf1 0.354718283 FNBP1 0.36114745 RSL1D10.685958983 LOC64331 0.354694241 TWSG1 0.351262263 EEF2 0.685894553 PLD60.348358154 AHNAK 0.341474449 PIN1 0.685299297 GSTM3 0.347442317 CMTM40.33968982 MTCP1 0.684631822 CBR3 0.322565348 EPAS1 0.335925656 LYRM40.683961594 CAMSAP1L1 0.321123437 FAM19A2 0.331599374 LOC4399490.682459967 C21orf33 0.316181939 BMPR2 0.265431535 MOAP1 0.679537354ZNF773 0.294777162 C5orf53 0.251347985 NIP7 0.678675569 POTEE0.294494551 OR7E156P −0.215227946 IFFO2 0.677846416 ELA1 0.293626752LOC1132493 −0.281312391 NUCB2 0.677791323 SPNS3 0.28537988 SIL1−0.286555239 MAGEE1 0.677713541 AKR1C3 0.27769758 BCL2L11 −0.341419371LOC1131662 0.677193155 CCDC23 0.263623678 UHRF2 −0.354936336 MRPS150.675764332 GSTM2 0.257679191 PARP15 −0.37762429 NOG 0.675741187 DNTT0.242897277 SGOL2 −0.411241473 POLR3GL 0.675617726 ACSM3 0.241276627LOC644482 −0.415712543 RPL17 0.675285949 ZNF683 0.231965799 NCKAP1L−0.418321587 AK3 0.674199622 LAPTM4B 0.228282129 HCFC1R1 −0.449654339IL23A 0.672979677 C6orf16 0.225251342 LOC92755 −0.452596549 ALDH5A10.671134823 GSTM4 0.215359789 BATF −0.463729569 ZNF54 0.667378217 PFKFB30.213262843 LOC729779 −0.468574718 SFRS2B 0.667128489 PEMT 0.188677328ING3 −0.479326333 LOC649821 0.663376153 TOX2 0.157468472 LOC64746−0.51635382 LPAR5 0.661938675 LOC72949 −0.198886269 LOC644745−0.516637429 ZNF792 0.661844441 TROVE2 −0.229347861 SERPINB8−0.523912813 CD4LG 0.659346237 MPDU1 −0.236721729 C15orf57 −0.524154265LOC147727 0.658543639 BRWD2 −0.272935165 SLC25A19 −0.533627461 FAM12A0.658423623 ANKRD41 −0.278587817 GNG7 −0.541637763 SLC25A23 0.65773867WASH2P −0.283377589 CEPT1 −0.568436894 GLRX5 0.655646442 ECT2−0.326623195 RPS7 −0.573623857 HIGD2A 0.654182518 LGSN −0.351114879MRPL41 −0.578622978 ZNF26 0.653666419 CLEC12A −0.35367923 CCDC28B−0.58366246 NFX1 0.653548398 LOC44264 −0.384431314 PSMB7 −0.586314985NELL2 0.653478218 AP1G1 −0.389494962 LOC644877 −0.587312525 NDUFB110.653473711 ADCY7 −0.427648158 TCEB1 −0.614147656 CCDC65 0.651898138MIR1974 −0.429379598 CKS2 −0.619366364 ZNF518B 0.651475739 CTRL−0.448386681 THOC4 −0.625798657 TCEA2 0.649342463 LOC42112 −0.453281873LOC113181 −0.636264657 LOC113291 0.649229319 ANXA2P3 −0.457511395LOC7292 −0.648413997 PABPC4 0.649134234 LOC1133875 −0.459712358 MRPL17−0.672451965 EIF2S3 0.648894172 HM13 −0.461368312 DBI −0.689455395 RPS180.646475474 CD74 −0.465298864 LOC113932 −0.717395773 STAT4 0.646221522LILRA3 −0.467695852 ETFB −0.734397533 CCDC25 0.644689569 ARHGAP3−0.469736658 NUDCD2 −0.74328978 RPL8 0.644367573 NLRC5 −0.474588382TMEM126B −0.757728329 PGM2L1 0.643897977 SULT1A2 −0.482875287 GTF3C6−0.795216188 FKBP1A −0.492172182 JAM3 −0.497945832 FCGR2B −0.514251626CLEC12B −0.515195232 TRPC4AP −0.519258529 C11orf82 −0.521156625 PTK2B−0.524676726 GPR65 −0.525797342 KLF5 −0.527857833 PKM2 −0.539118323SAP3L −0.539171373 SULT1A3 −0.547825718 ANXA2P1 −0.548762819 NFKBIB−0.558246324 GDI1 −0.561865494 PSRC1 −0.564178565 HHEX −0.583227669DIP2B −0.594517957 WWP2 −0.614284312 LOC42221 −0.626577759 SIGLEC7−0.627915225 LOC1124692 −0.6312228 LILRA1 −0.634928539 MEF2A−0.639317827 HSH2D −0.649436192 CTSC −0.655139391 BIN2 −0.655173425 LSP1−0.668495558 TNFSF13 −0.67161967 EFCAB2 −0.682346884 LOC113251−0.688489257 ILK −0.693325115 HIST1H2AD −0.695734597 LOC648733−0.696389547 C1orf58 −0.712867866 KDM1B −0.718128564 AQP12A −0.724567526LOC65275 −0.73677314 ITGAX −0.744397547 IRF2 −0.769235155 AFF1−0.784337538

TABLE 1.1 Top 50 genes of set 1 with absolute value of weights closer to1 (highest weight from 0.818423753 to 0.935783593): CD3D UXT  RPS4XLOC283412 LOC127295 LOC42694 SKAP1 LOC72882 LOC645173 RPL23A LOC646942LOC646294 LOC728428 LOC44737 LOC7329 LOC391833 RPS3 RPL36 LOC1127993LOC73187 LOC72831 LOC653162 LOC729679 LOC441246 LOC387841 C13orf15LOC728576 EIF3K EEF1B2 LCK LOC39345 RPL4 LOC1132742 EIF3H CD27 RPS15LOC649447 LOC1131713 LOC286444 LOC729789 RPL1A CD6 LOC646766 C17orf45CUTA EIF3F LOC642741 LOC388339 RPS14 LOC11398

TABLE 1.2 Top 50 genes of set 2 with absolute value of weights closer to1 (highest weight from 0.711353632 to 0.899346773): CUTL1 MAST3 STK4KIAA247 MYH9 RAPGEF2    ARAP3RAB11FIP1 WBP2 GNAI2 MTMR3 GTF3C6 CBLUBE4BIGF2R    YPEL3 SETD1B PIK3CD RASSF2 KDM6B TP53INP2    NUAK2 PAK2MYO9B NDE1 TMEM126B IRS2 PHF2 MAP2K4    CAMK1D NUDCD2 CDC2L6 ETFB ASAP1TSC22D3    TLN1 ANXA11 EP3 ROD1 RXRA RASSF5 PELI2 SEMA4D    LOC113932PPM1A CREBBP LAPTM5 CABIN1 PLCB2 WNK1

TABLE 1.3 Top 50 genes of set 3 with absolute value of weights closer to1 (highest weight from 0.717268353 to 0.952217397) LOC44926  ITM2B HOXC6LOC392288 YIPF4 RBMS1 USP6 KIAA133 LOC642567 EVI2B UBE2W DDX3X UBE2D1HIAT1 TTRAPLOC44525 C18orf32 LOC1132888 ROCK1 LOC64798 FAM91A2 SENP6LOC732229 CEP63 ATG3 LOC1128269 PLAGL1 MBD2 EXOC8MRRF LOC113377 POTE2C8orf33 LOC38953 CPEB3 C6orf211 LOC1128533 LOC648863 STX7 SEPT14 LOC4493LOC442319 NCRNA81 CLEC7A CSNK1A1L LOC643896 P74P LOC4948 GABARAPL2FCGR3A

TABLE 1.4 Top 50 genes of set 4 with absolute value of weights closer to1 (highest weight from 0.53314 to 0.982476) SDPR PDE5A PTGS1 CTDSPL   CTTN ALOX12 MPL DNM3 C1orf47 C7orf41 C5orf4 RAB27B CXorf2 GRAP2CDC14BDAB2 TAL1 NCALD ITGB5 GUCY1A3 FERMT3 TSC22D1 LIMS1 SLC8A3 ABCC3HOMER2NAT8BFBLN1 ARHGAP21 C21orf7 C15orf52 CABP5 ENDOD1 SOCS4 C15orf26     PVALB SLC24A3     HGD ZNF185 CA2 CXCL5 GRB14 VWF  DKFZp686I15217NDUFS1 GRASPRGS18 C16orf68    MGC135      LOC64926 HIST1H2AE

In certain embodiments, normalized gene expression values of thesignature genes in Table 1 can be used as is, thus without weighting,for the classification of ASD vs non-ASD. In certain embodiments, usingBoosting (see Scoring and Classification methods) three lists of geneswere identified with the smallest number of elements that classifiedsubjects with accuracy of at least 70%, at least 75%, and at least 80%.Sets with 20, 25 and 30 features that can produce at least 70%, at least75%, and at least 80% correct classification include but are not limitedto those shown in Table 2 below. In certain embodiments, adjusting theweights of the genes based on the age of the subject is the mostimportant single parameter for improving accuracy of ASD classification

TABLE 2 Minimum # Accuracy % of features Gene list + AGE 80% 30 “AGE”AK3 LOC100132510 ARID4A CMTM4 KIAA1430 LOC441013 MAL SETD1B AKR1C3ATXN7L3B PARP15 AP2S1 CA2 PAN3 MTMR3 TOP1P2 UHRF2 LOC92755 EPOR MED31LOC389286 LOC646836 MSRB3 GPR65 SMPD1 GPX4 LOC100133770 PRKCBLOC100129424 75% 25 “AGE” FCGR3A LOC389342 IGF2R ARAP3 PDE5A MPL CUTL1LOC642567 SDPR PTGS1 MIR1974 MAP1LC3A LILRA3 LOC100133875 SPI1 LOC653737IRS2 MAST3 NCF1B STK40 KIAA0247 LOC648863 CTDSPL NCALD 70% 20 “AGE”IGF2R ARAP3 FCGR3A LOC389342 LOC648863 SPI1 LOC642567 CUTL1 PDE5A ASAP1KIAA0247 MAP1LC3A ZNF185 IRS2 MTMR3 LOC100132510 IMPA2 NCALD MPL

The gene-networks signature matrix (GNSM) is a matrix containing weightsand features calculated from gene-to-gene interaction patterns. Theseinteraction patterns can also be calculated based on the relationship orstate of a gene with non-genomic features.

The multi-modal signature matrix (MMSM), which is a matrix containingthe quantification of non-genomic features obtained by clinical,behavioral, anatomical and functional measurements. In certainembodiments, the MMSM includes, but is not limited to, age, GeoPrefTest²⁸, MRI/fMRI/DTI, and ADOS test.^(29,30) Scores from questionnairesare also included in the MMSM for instance the CSBS test.³¹

The collateral features signature matrix (CFSM), which is a matrixcontaining any features that are not related to the subject under study.In certain embodiments, the CFSM includes, but is not limited to,analytes in maternal blood during pregnancy, sibling with autism,maternal genomic signature or preconditions, and adverse pre- orperinatal events.

In certain embodiments, the invention provides the use of the weightedgene signature matrix (WGSM) which is based for example on four sets ofgenes and gene-weights (Weight Sets 1-4, see Table 1) that predictautism with high accuracy. In some exemplary embodiments, the WGSMincludes a total of 762 genes as listed above (see Table 1), 2 or moregenes arranged in any number of sets can be included, as well. It is tobe understood that the exact number of genes used in the method can varyas well as the type of genes based on the model derived from the AutismReference Database.

In certain embodiments, the WGSM technology of the invention comprisesthe following steps:

Step 1: Collection of quality blood leukocyte samples and extraction ofRNA from leukocytes. Blood leukocytes are collected from a newborn,infant, toddler or young child as part of a general pediatric screeningprocedure or as a diagnostic test for those at high risk for autism(such as younger siblings of an autistic child) or suspected to haveautism. Temperature and history are taken and documented prior bloodsample collection. Samples are collected if the child has no fever,cold, flu, infections or other illnesses or use of medications forillnesses 72 hours prior blood-draw. If a child has a fever, cold, etc,then blood samples should be collected no sooner than a week after theillness is over.

Four to six ml of blood is collected into EDTA-coated tubes. Leukocytesare captured and stabilized immediately (for instance via a LEUOLOCKfilter, Ambion, Austin, Tex., USA) and placed in a −20 degree freezerfor later processing.

mRNA is extracted from leukocytes according to standard practices. Forexample, if LEUKOLOCK disks are used, then they are freed from RNAlaterand Tri-reagent is used to flush out the captured lymphocytes and lysethe cells. RNA is subsequently precipitated with ethanol and purifiedthough washing and cartridge-based steps. The quality of mRNA samples isdetermined with RNA Integrity Number (RIN) assays and only values of 7.0or greater are considered acceptable for use in the next steps.Quantification of RNA is performed using, as an example, Nanodrop(Thermo Scientific, Wilmington, Del., USA).

Step 2: Determination of gene expression levels for genes used in theWeighted Gene Test of Autism. Whole-genome gene expression levels areobtained by using either a microarray-based platform (such as IlluminaHT-12 or equivalent) or next-generation sequencing. The analysis of geneexpression levels can also be performed using a targeted approach basedon custom microarrays, targeted sequencing or PCR-based amplification ofthe WGSM and/or gene-networks signature matrix (GNSM) genes (see belowGene Expression Profiling).

Whichever method is used, however, it should provide high fidelityexpression levels for each of the genes in the WGSM. This is achieved byusing methods that interrogate the signal intensity and distribution ofeach probe/gene. For instance, a detection call p-value of 0.01 is usedas the threshold to filter out probes/genes with expression levels ofpoor quality. For analyses performed on multiple subjectssimultaneously, any probe/gene with no detectable levels in at least onesubject is also eliminated. Once the final set of probes/genes with highfidelity expression levels is determined, the data is transformed (forinstance with the “log 2” function) and normalized. The normalizationstep is helpful in order to obtain informative and comparable expressionlevels to the weighted gene expression reference database.

In certain embodiments, the weighted gene feature test of autism (WGFTA)technology utilizes the simultaneous analysis of at least 20, 40, 80,150 or more subjects (recruited and processed with similar criteria ofthe reference dataset discussed below) for independent normalization. Inthe case of fewer subjects, these subjects can be added to the referencedatabase prior to normalization. Normalization can then be performedusing for instance the “quantile” method.

At the conclusion of Step 2, the normalized gene expression value (NGEV)for an individual subject or patient has been determined for each genein the WGM. In some embodiments, one or more NGEVs are used to classifygenes for use in the methods of the invention without further using agene-specific weight. In certain embodiments, the NGEVs are used withMMSM and/or CFSM values. In alternative embodiments, the NGEVs are usedwithout MSSM and/or CFSM values.

Step 3: The procedure in the weighted gene feature test of autism(WGFTA) involves application of the gene-specific weights from theweighted gene signature matrix to the NGEV in each child. For each genein the WGSM, its NGEV is multiplied by that gene's gene-specific weight(for example, see Table 1). The resultant value for each gene is theweighted gene expression level. In certain embodiments, the genes in therepresentative example Weights Sets 1-4 constitute the genes in the WGSMand used in the WGFTA.

The weighted gene expression levels in a subject's (or patient's) samplecan be further processed to reduce dimensionality using methods such asprincipal component analysis (PCA) or eigenvalues or multi-dimensionalscaling (MDS). This step reduces computation time, data noise andincreases power in the subsequent analysis steps, while it preserves thebiological information useful for the classification. If computationpower, time and data noise is not an obstacle, then the weighted geneexpression level data in each subject or patient can be used as is inthe next step.

Step 4: the second procedure in the weighted gene feature test of autism(WGFTA) is the comparison of weighted gene expression levels to a uniqueautism and control weighted gene expression reference database. Thesubject or patient's set of weighted gene expression levels is comparedto the specific multidimensional weighted gene expression referencedatabase to establish a score for autism risk and/or a class identity(ASD, non ASD). Two different scoring or CLASS identity methods areapplied (see below).

In certain embodiments, the performance of the invention includes: theprediction accuracy of the weighted reference database, the ROC curveswith estimated AUC, Accuracy, Specificity, Sensitivity and the matrix ofweights for the identified gene-sets. See FIGS. 4C and 4D (Logisticregression analysis and classification outcome of the weighted referencedatabase) and Table 1.

Scoring/CLASS Identity Methods

In certain embodiments, the following scoring methods are used. However,any available scoring methods, now known or later developed, areencompassed within the scope of the invention.

In certain embodiments, methods use boosted classification trees tobuild the screening, diagnostic and prognostic classifiers, with orwithout the use of modules to classify the genes. This classificationregime is divided into two main components. First, the underlyingclassification algorithm is a classification tree. Second, boosting isapplied to this baseline classifier to increase the prediction strength.The resulting learning algorithm retains the strengths of the baselineclassifier while improving the overall predictive capability. Inparticular embodiments, there are two classes, ASD and non-ASD; theclasses are represented symbolically by +1 and −1. The training datasetconsists of labeled cases (x₁, y₁), (x₂, y₂), . . . , (x_(N), y_(N)).Here, y_(i) is a class label and x_(i) is vector of variables orfeatures measured for the i-th individual. A classifier is representedby a function C(x) whose input is a vector x in the feature space andwhose output is one of the class labels.

In the first component, namely classification trees, the underlyinglearning algorithm used is a decision tree for classification. Anyclassifier can be represented by a partition of the feature space intodisjoint regions R₁, R₂, . . . , R_(k) and associated labels c₁, c₂, . .. , c_(k). The class of a new, unlabeled case is predicted by locatingthe region into which the feature coordinates of the case falls andreading off the class label for that region. In a decision tree, thispartition is represented by the leaf nodes in a binary tree (see FIG.15). Starting at the root of the tree, each node represents asubdivision of a region of the feature space by splitting it on one ofthe variables. The feature space is thus recursively divided intoincreasingly finer sub-regions. The “leaf” nodes at the bottom of thetree are affixed with class labels. The best partition forclassification is learned from the data: for a given node, the variablefrom the full feature set and the threshold value for that variable thatbest separate the data into its constituent classes is selected,producing two child nodes. The selection is based on maximizing somemeasure of fitness of the resulting classifier, such as the informationgain. The process is repeated for each node until a halting criterion isreached, such as when all of the training data points in a givensub-region are of the same class.

Then, in the second component, namely boosting, the classification treeis improved using a boosting algorithm (such as AdaBoost). Thisalgorithm works by iteratively fitting a baseline classifier and usingits performance on the training data to re-weight the importance of eachpoint in subsequent fits (see FIG. 16). Initially, each of the datapoints is given equal weight. After fitting the classifier, the errorrate on the training data is used to produce a weight α associated withthe classifier. The weights of the data points are then updated: theweights of misclassified points are increased while correctly labeledpoints are de-emphasized. This forces the next classifier to pay moreattention to cases where errors were previously made. The process isrepeated using the re-weighted observations in the next iteration; ithalts when the test error—computed from a test data set or via crossvalidation—has stabilized, or when a fixed number of iterations has beenreached. Formally, the algorithm proceeds as follows. Let w_(i) be theweight of the i-th training point, i=1, . . . , N. Initialize theweights as w_(i)=1/N. For j=1, . . . J, do the following:

1. Fit the classifier C_(j)(x) to the weighted data set.

2. Compute the weighted training error rate e_(j)=Σ_(i=1)^(N)w_(i)I(y_(i)≠C_(j)(x_(i)))/Σ_(i=1) ^(N)w_(i).

3. Compute the weight α_(j)=ln((1−e_(j))/e_(j)).

4. For each i, update the weights according to w_(i)←w_(i)×e^(α) ^(i)^(I(y) ^(i) ^(≠C) ^(j) ^((x) ^(i) ⁾⁾.

The result is a sequence (C₁(x), α₁), (C₂(x), α₂), . . . , (C_(J)(x),α_(J)) of classifiers and associated weights. The sequence is combinedinto a final classifier by taking the sign of a weighted sum of thesequence: C(x)=sign(Σ_(j=1) ^(J)α_(j)C_(j)(x)).

In other embodiments, an alternative to the tree-based classifier can beused such as distance-based methods that utilize distances in thefeature space in order to predict the class labels. The procedure canquantify the extent to which a given set of features conforms to each ofthe classes, and predicts the label of the class with the highestconcordance. For each class, the mean vector μ and covariance matrix Σof the feature distribution is estimated using the sample mean andsample covariance matrix. Then, for a given point x in the featurespace, the Mahalanobis distance between the x and the meand=((x−μ)^(T)Σ⁻¹(x−μ)^(1/2) is computed. The predicted label for x is thelabel corresponding to the class that minimizes this distance. Theperformance of the resulting classifier can then be improved by using itas the baseline classifier in the boosting procedure outlined above.

With multiple feature sets, the model detailed here can be fit using awide range of features for prediction. In some instances, only certaintypes of features may be available at the time of prediction. Forexample, only gene expression signatures and age might have beenobserved for a particular patient. The model can be fit using variouscombinations of feature modalities from the MMSM and CFSM as well asGNSM. The result is a suite of classifiers, each one suited to adifferent configuration of feature types. This yields a classificationprocedure that can be utilized for a range of patient dataavailabilities and thus is robustly useful in the applied setting.

Performance of the WGSM was tested with several algorithms including,but not limited to, Random Forest-, Neural Network-, Support VectorMachine-, Boosting- and Logistic Regression-based methods andindependently validated on a second dataset of autism and non-autismsubjects. This testing showed high-accuracy in diagnostic classificationof autism (80% or greater classification accuracy), thus confirming: 1)the efficacy and specificity of a unique pattern of gene weights, 2) therelevance, sensitivity and specificity of the identified four sets ofgenes, and 3) the reliability of the multi-dimensional weightedreference of autism and control.

Score calculation and class prediction are generated by thecomputer-based algorithms selected to test the WGSM on the newsubject(s) (see previous paragraphs). A comparison of matrices isperformed by using distance-based classification between the newsubject(s) matrices and the referenced matrices from both the ASD andcontrol subjects.

Gene Expression Profiling

The invention enables the use of both genome-wide and gene targetedapproaches to quantify gene expression levels of peripheral bloodleukocyte of a test subject. As used herein, the Genome-wide approachesinclude, but are not limited to, the use of microarray-based platformsand next-generation sequencing. Expression levels of the genes belongingto the WGSM are extracted after standard normalization, transformationand filtering steps (see Methods in the Examples below). As used herein,the Gene-targeted approaches include, but are not limited to,microarray-based platforms or PCR-based amplification.

With the targeted approaches only the expression levels of the genesbelonging to the WGSM are determined. The use of whole-genomemicroarrays requires an a priori construction of a gene-library or theuse of a gene-capturing method. Alternatively, the targeted approach viamicroarray-based platform is done by the construction of custom-designedgene expression microarrays containing only the genes from the fourgene-sets with control and reference probes and replicated on the sameplatform to allow high reproducibility and testing of multiple patients.Gene expression levels are then calculated with the use of controlprobes, reference genes and/or experiments.

WGSM Features:

1) Signature gene composition: The provided example of the WGSM includes762 genes. However, any 2 or more genes can be assayed on differentplatforms, array-based, sequencing based or PCR-based.

2) Splice variants information of the genes within the WGSM is alsoused.

3) Data redaction tools are also applied to the genes of the WGSM.

In some embodiments, the invention provides that the WGSM can be usedalone in the Weighted Gene Feature Tests of Autism (WGFTA) or incombination with one or more of the other matrices described above. Incertain embodiments, the combination use of the WGSM with subject's ageas Multi-Modal Signature Matrix (MMSM) is provided.

A major strength of the signature discovery was the recruitment ofsubjects using a general, naturalistic population screening approach.This approach allowed the unbiased, prospective recruitment and uniquestudy of autism and contrast patients as they occur in the communitypediatric clinics. To maximize the number of ASD and control subjectsfor the signature discovery, a slight age difference is tolerated in thetwo subject distributions and age is included as a predictor in theclassification analysis. The impact of all predictors was then assessedin the classification of the subjects by logistic regression withbinomial distribution. The output of these analyses is provided asfollows:

a) Analysis using age as the only predictor of diagnosis showed a verysmall ODDS ratio of 1.07 towards the ASD CLASS.

b) Analysis using the Weights Sets 1-4 as predictor singularly showedODDS ratios (9577.88, 17423.52, 4.16e-05 and 3716.94 respectively).

c) Analysis using all predictors together (Weights Sets 1-4 and age)showed again very large ODDS ratios for the Weights Sets 1-4 predictors(1.73e+06, 1.46e+05, 5.31e-03 and 6.235152e+01 respectively) and an ODDSratio close to 1 for age (1.089). Using different algorithms,classification performance improved on average by 3-4% (see Table 3).

TABLE 3 Classification performance using different algorithms with andwithout age as predictor % Accuracy without AGE % Accuracy with AGEAlgorithm discovery replication discovery replication Name set set setset glmnet 78 72 82 75 mlp 78 72 83 69 cforest 87 70 91 72 svm radial 8168 87 70 random forest 100 70 100 67 qvnnet 84 65 84 71

It is known that the transformation from ODDS values to probability is amonotonic transformation following an exponential curve. An ODDS ratioof 1 indicates a 0.5 probability to fall into either CLASS, in this caseASD and non-ASD. An ODDS value tending to infinity or zero indicates avery high or very low probability, respectively, to be classified ASD.Therefore, it is demonstrated that although age effects are present inthis study, they are very small considering the effects of the geneexpression signature predictors (Weights Sets 1-4 in Table 1). Moreover,this effect is empirically quantified by classification of bothdiscovery and replication subjects with and without age as a predictor.It was found that classification accuracy increased by about 3-4% ormore when age was included as a predictor in the analysis, and so incertain embodiments, the invention uses age as a predictor in screening,diagnostic and prognostic signatures of ASD, as shown in one Example 3below.

Similarly, additional predictors from the MMSM which includesnon-genomic quantifiable features obtained by clinical, behavioral,anatomical and functional measurements. In certain embodiments, clinicalfeatures are scores on the ADOS, Mullen, Vineland, and any otherdiagnostic and psychometric test instruments. In certain embodiments,neurobehavioral features are eye-tracking tests such as theGeoPreference Test of autism and exploration tests. In certainembodiments, anatomical features are MRI neuroanatomical measuresincluding, but not limited to, global and regional gray or white mattervolumes, cortical surface areas or thickness and cortical gyrificationas determined by methods including, but not limited to, voxel-based,statistical mapping-based and surface or structure reconstruction basedmethods (e.g., temporal grey matter volumes, and DTI measures includingtract FA and volume and gyral patterns of cortical tract projections).

In certain embodiments, the functional features are fMRI measuresincluding, but not limited to, activation, psychophysiological (PPI),dynamic causal modeling, unsupervised classification information mapsand values. Similarly features from the GNSM and CFSM are used with orwithout the WGSM as predictors in the classification and prognosticanalyses.

Therefore, the invention in some embodiments utilizes a test based on aspecific pattern of specific-gene weights in a person that are involvedin governing cell cycle, DNA damage response, apoptosis, proteinfolding, translation, cell adhesion and immune/inflammation, signaltransduction ESR1-nuclear pathway, transcription-mRNA processing, cellcycle meiosis, cell cycle G2-M, cell cycle mitosis, cytoskeleton-spindlemicrotubule, and cytoskeleton-cytoplasmic microtubule functions. TheWGFTA provided by the invention requires high quality molecularcomponents, including RNA, genomic DNA, cellular and serum proteins, andsmall molecule analytes, that are extracted by clinically standardmethods from blood and other tissues collected using clinically routinemethods from ages of birth to 1 year, 1 year to 2 years, 2 years to 3years and 3 years to 4 years. The present invention provides that theDNA and/or mRNA can be collected in many ways and/or isolated orpurified directly from a biological tissue or cell sample, including butnot limited to tears, saliva, mucous, buccal swab, whole blood, serum,plasma, cerebrospinal fluid, urine, and the like, or cells including,but not limited to fibroblasts, iPS cells, neuroprogenitor cells derivedfrom iPS cells, and neurons derived from iPS cells, etc. A biologicalsample could also be obtained from specific cells or tissue, or from anysecretions or exudate. In certain embodiments, the biological sample isa biological fluid obtained from peripheral blood. In certainembodiments, DNA is isolated or purified from peripheral blood nuclearcells (PBMCs) derived from fresh blood. Techniques for purification ofbiomolecules from samples such as cells, tissues, or biological fluidare well known in the art. The technique chosen may vary with the tissueor sample being examined, but it is well within the skill of the art tomatch the appropriate purification procedure with the test samplesource.

In some embodiments, the WGFTA of the invention uses any one of severalknown and state-of-the-art whole genome RNA-based/gene expression assay(such as RNA sequencing, custom gene expression arrays, PCR-basedassays, state-of-the-field whole genome microarrays or genomesequencing) that give accurate expression levels. In some embodiments,the WGFTA is based on gene sets such as, for example the four sets:Weights Set 1, Weights Set 2, Weights Set 3 and Weights Set 4 inTable 1. In certain embodiments, the WGFTS includes specific splicevariants of the genes. In some embodiments, the Weighted Gene Matrixcomprises genes in the WGFTA and their Gene-Specific Weights (see Table1). Furthermore, in certain embodiments, the autism-critical weightedgene expression levels is the transformation of an individual'snormalized expression levels of the genes in the weighted gene signaturematrix by gene-wise multiplication of the gene-specific weights.

Depending upon the factors unique to each case and desired level ofspecificity and accuracy, any number of genes may be selected, forexample, from those described in Table 1. In some embodiments, the genesare generally ranked according to relative importance based on theabsolute value of the weight. In certain embodiments, the number ofgenes chosen includes at least 10, 20, 25, 30, 35, 40, 50, 60, 70, 80,90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 200, 210, 220, 230,240, 250, 260, 270, 280, 290, 300, 350, 400, 450, 500, 550, 600, 650,700, 750 or more genes, including intervening and greater numbers,within a selected gene set.

In certain embodiments, the invention of the WGFTA is (a) theapplication of the unique gene-specific weights to an individual'snormalized gene expression values for those genes in order to derivethat individual's autism-critical weighted gene expression levels; (b)the application of any subset of the unique Gene-Specific Weightsderived from the effort in optimizing the classification performance ofthe Weighted Gene and Feature Test of Autism; (c) the modification ofthe Weights in the Weighted Gene Signature Matrix as the result of theoptimization of the Autism and non-autism Weighted Gene ExpressionReference Database from which the WGM is derived; (d) the comparison ofan individual's Autism-Critical Weighted Gene Expression Levels to theAutism and not-autism Weighted Gene Expression Reference Database; and(e) the development and use of any RNA-based assay that uses theWeighted Gene Signature Matrix to test risk for autism.

In some embodiments, the invention also provides that the developmentand use of any RNA-based gene expression data combine with MMSM measures(for example anatomical and/or functional brain measurements) to screenfor autism risk or diagnostically classify autism and otherdevelopmental disorders. For example, in certain embodiments, age isconsidered in conjunction with a subject's gene expression levels and asa predictor (for example see Scoring/CLASS identity methods above) inadjusting and/or improving screening for autism risk, autism diagnosticclassification and prognosis analysis, and the WGFTA is based on thecomparison of an autistic subject(s) to a non-autistic subject(s).Further, the use of the GeoPreference test score, CSBS (communicationand symbolic behavior scales) test scores and genomic DNA (CNV, SNV,indel) markers in combination with expression signatures (for example inone embodiment of the method described in Scoring/CLASS identity methodsabove) increase the WGFTA performance and improve classification ofautism and other developmental and neuropsychiatric disorders.

In some embodiments, the autism and non-autism reference databaseprovided by the invention comprises the collection of Gene ExpressionLevels, Weights and all non-genomic features already described that wereuniquely derived from the fully clinically characterized anddiagnostically confirmed infants, toddlers and young children withautism, typically developing (TD), and non-autism non-TD subjects.

Therefore, the weighted gene and features tests of autism (WGFTA)provided by the invention can in some embodiments be used in pediatricpopulation screens for risk of autism and in clinical follow-updiagnostic and prognostic evaluations of newborns, infants, toddlers,and young children who are suspected to be at risk for autism. Someattributes of the invention are based on analyses of in vivo functionalgenomic abnormalities in mRNA expression from blood leukocytes as theyrelate to the measures of brain and cerebral size and to mRNA expressionpatterns in typically developing controls. Thus, in certain embodimentsthe invention is based on direct experimental knowledge of thefunctional genomic defects and the resulting brain size relationshipsthat are disrupted in autistic toddlers as compared to control subjects.

REFERENCES

-   1. Developmental Disabilities Monitoring Network Principal    Investigators. Prevalence of autism spectrum disorders—Autism and    Developmental Disabilities Monitoring Network, 14 sites, United    States, 2008. MMWR Surveill Summ 61, 1-19 (2012)-   2. Huttenlocher, P. R. Dendritic and synaptic development in human    cerebral cortex: Time course and critical periods. Developmental    Neuropsychology 16, 347-349 (1999).-   3. Huttenlocher, P. R. & Dabholkar, A. S. Regional differences in    synaptogenesis in human cerebral cortex. Journal of Comparative    Neurology 387, 167-78 (1997).-   4. Pierce K, Carter C, Weinfeld M, et al. Detecting, studying, and    treating autism early: the one-year well-baby check-up approach. The    Journal of pediatrics 2011; 159:458-65 e1-6.-   5. Wetherby A M, Brosnan-Maddox S, Peace V, Newton L. Validation of    the Infant-Toddler Checklist as a broadband screener for autism    spectrum disorders from 9 to 24 months of age. Autism 2008;    12:487-511.-   6. Pandey J, Verbalis A, Robins D L, et al. Screening for autism in    older and younger toddlers with the Modified Checklist for Autism in    Toddlers. Autism 2008; 12:513-35.-   7. Kleinman, J. M. et al. The modified checklist for autism in    toddlers: a follow-up study investigating the early detection of    autism spectrum disorders. J Autism Dev Disord 38, 827-39 (2008).-   8. Chlebowski, C., Robins, D. L., Barton, M. L. & Fein, D.    Large-scale use of the modified checklist for autism in low-risk    toddlers. Pediatrics 131, e1121-7 (2013).-   9. Zwaigenbaum L, Bryson S, Rogers T, Roberts W, Brian J,    Szatmari P. Behavioral manifestations of autism in the first year of    life. Int J Dev Neurosci 2005; 23:143-52.-   10. Ozonoff S, Young G S, Carter A, et al. Recurrence risk for    autism spectrum disorders: a Baby Siblings Research Consortium    study. Pediatrics 2011; 128:e488-95.-   11. Paul R, Fuerst Y, Ramsay G, Chawarska K, Klin A. Out of the    mouths of babes: vocal production in infant siblings of children    with ASD. J Child Psychol Psychiatry 2011; 52:588-98.-   12. Landa R, Garrett-Mayer E. Development in infants with autism    spectrum disorders: a prospective study. Journal of child psychology    and psychiatry, and allied disciplines 2006; 47:629-38.-   13. Abrahams B S, Geschwind D H. Advances in autism genetics: on the    threshold of a new neurobiology. Nature reviews. Genetics. 2008;    9:341-355.-   14. Chow M L, Pramparo T, Winn M E, et al. Age-dependent brain gene    expression and copy number anomalies in autism suggest distinct    pathological processes at young versus mature ages. PLoS Genet.    2012; 8(3):e1002592.-   15. Devlin B, Scherer S W. Genetic architecture in autism spectrum    disorder. Current opinion in genetics & development. June 2012;    22(3):229-237.-   16. Courchesne E, Pierce K, Schumann C M, et al. Mapping early brain    development in autism. Neuron. 2007; 56:399-413.-   17. Stanfield A C, McIntosh A M, Spencer M D, Philip R, Gaur S,    Lawrie S M. Towards a neuroanatomy of autism: a systematic review    and meta-analysis of structural magnetic resonance imaging studies.    European psychiatry: the journal of the Association of European    Psychiatrists. June 2008; 23(4):289-299.-   18. Courchesne E, Mouton P R, Calhoun M E, et al. Neuron number and    size in prefrontal cortex of children with autism. JAMA. Nov. 9,    2011; 306(18):2001-2010.-   19. Courchesne E, Karns C M, Davis H R, et al. Unusual brain growth    patterns in early life in patients with autistic disorder: an MRI    study. Neurology. 2001; 57:245-254.-   20. Glatt S J, Tsuang M T, Winn M, et al. Blood-based gene    expression signatures of infants and toddlers with autism. J Am Acad    Child Adolesc Psychiatry 2012; 51:934-44 e2. And also-   21. Kong S W, Collins C D, Shimizu-Motohashi Y, et al.    Characteristics and predictive value of blood transcriptome    signature in males with autism spectrum disorders. PloS one. 2012;    7(12):e49475.-   22. O'Roak, B. J. et al. Multiplex targeted sequencing identifies    recurrently mutated genes in autism spectrum disorders. Science 338,    1619-22 (2012).-   23. Klin, A., Lin, D. J., Gorrindo, P., Ramsay, G. & Jones, W.    Two-year-olds with autism orient to non-social contingencies rather    than biological motion. Nature 459, 257-61 (2009).-   24. Jones, W., Carr, K. & Klin, A. Absence of preferential looking    to the eyes of approaching adults predicts level of social    disability in 2-year-old toddlers with autism spectrum disorder.    Arch Gen Psychiatry 65, 946-54 (2008).-   25. Shic, F., Bradshaw, J., Klin, A., Scassellati, B. &    Chawarska, K. Limited activity monitoring in toddlers with autism    spectrum disorder. Brain Res 1380, 246-54 (2011).-   26. Bedford, R. et al. Precursors to Social and Communication    Difficulties in Infants At-Risk for Autism: Gaze Following and    Attentional Engagement. J Autism Dev Disord (2012).-   27. Chawarska, K., Macari, S. & Shic, F. Context modulates attention    to social scenes in toddlers with autism. J Child Psychol Psychiatry    53, 903-13 (2012).-   28. Pierce, K., Conant, D., Hazin, R., Stoner, R. & Desmond, J.    Preference for geometric patterns early in life as a risk factor for    autism. Archives of General Psychiatry 68, 101-9 (2011).-   29. Luyster, R. et al. The Autism Diagnostic Observation    Schedule-toddler module: a new module of a standardized diagnostic    measure for autism spectrum disorders. J Autism Dev Disord 39,    1305-20 (2009).-   30. Lord, C. et al. The Autism Diagnostic Observation    Schedule—Generic: A Standard Measure of Social and Communication    Deficits Associated with the Spectrum of Autism. in Journal of    autism and developmental disorders Vol. 30 205-223-223 (Springer    Netherlands, 2000).-   31. Wetherby A M, Allen L, Cleary J, Kublin K, Goldstein H. Validity    and reliability of the communication and symbolic behavior scales    developmental profile with very young children. Journal of speech,    language, and hearing research: JSLHR 2002; 45:1202-18.

32. Burrier et al, INSAR abstract, 2013.

EXAMPLES Example 1 Disrupted Gene Networks in Autistic Toddlers UnderlieEarly Brain Maldevelopment and Provide Accurate Classification

Genetic mechanisms underlying abnormal early neural development intoddlers with Autism Spectrum Disorder (ASD) remain unknown, and nogenetic or functional genomic signatures exist to detect risk for ASDduring this period. The objective in this example was to identifyfunctional genomic abnormalities underlying neural development and risksignatures in ASD.

A general naturalistic population screening approach was used to allowprospective, unbiased recruitment and study of ASD and control(typically developing and contrast) toddlers from community pediatricclinics. Whole-genome leukocyte expression and MRI-based neuroanatomicmeasures were analyzed in a discovery sample of 142 males ages 1-4years. Co-expression analyses were applied to identify gene modulesassociated with variations in neuroanatomic measures and a candidategenomic signature of ASD. Class comparison and network analyses wereused to identify dysregulated genes and networks in ASD toddlers.Results were compared to a Replication sample of 73 toddlers.

Correlations of gene expression profiles with deviation in neuroanatomicmeasures from normative values for age were performed in ASD and controltoddlers. Classification performance was tested using logisticregression and ROC analysis. Cell cycle and protein folding genenetworks were strongly correlated in control toddlers with brain size,cortical surface area, and cerebral gray and white matter, but weaklycorrelated in ASD. ASD toddlers instead displayed correlations with anabnormal array of different gene networks including immune/inflammation,cell adhesion, and translation. DNA-damage response and mitogenicsignaling were the most similarly dysregulated pathways in bothDiscovery and Replication samples. A genomic signature enriched inimmune/inflammation and translation genes displayed 75-82%classification accuracy.

The functional genetic pathology that underlies early brainmaldevelopment in ASD involves the disruption of processes governingneuron number and synapse formation and abnormal induction of collateralgene networks. The orderly correlation between degree of gene networkdysregulation and brain size, suggest there may be a common set ofunderlying abnormal genetic pathways in a large percentage of ASDtoddlers. Knowledge of these will facilitate discovery of earlybiomarkers leading to earlier treatment and common biological targetsfor bio-therapeutic intervention in a majority of affected individuals.

Significant advances have previously been made in understanding thegenetic¹⁻³ and neural bases⁴⁻⁶ of autism spectrum disorder (ASD).However, establishing links between these two fundamental biologicaldomains in ASD has yet to occur. Clinical macrocephaly at young agesoccurs in an estimated 12% to 37% of patients, but a subgroup has smallbrain size. However, genetic explanations for this wide variation remainuncertain. Moreover, genetic signatures of risk for ASD in infants andtoddlers in the general pediatric clinic have not yet been found.

A long-theorized brain-gene link is supported by new ASD postmortemevidence, at least in ASD with brain enlargement. The theory⁷ is thatearly brain overgrowth, which occurs in the majority of ASDcases^(4,5,8-12), may result from overabundance of neurons due toprenatal dysregulation of processes that govern neurogenesis, such ascell cycle, and/or apoptosis. A recent postmortem study discoveredoverabundance of neurons in prefrontal cortex, a region that contributesto autistic symptoms, in ASD children with brain enlargement⁶, and asecond postmortem study reported abnormal gene expression in cell cycleand apoptosis pathways also in prefrontal cortex in ASD male children².Gene pathways identified in the latter ASD postmortem study areconsistent with those identified by CNV pathway enrichment analyses inliving ASD patients¹³. A complementary theory is that synapseabnormalities may also be involved in ASD¹⁴, but how this may relate toearly brain growth variation is unknown. Because direct analyses ofbrain-genome relationships during early development have never been donein ASD, it remains unknown whether genetic dysregulation of cell cycle,apoptosis and/or synapse processes underlie variation in brain growthand size in the majority of ASD toddlers. Since neuron number andsynapse formation and function are developmentally foundational anddrive brain size, common pathways leading to ASD may involve theirdysregulation.

A novel study of genomic-brain relationships in vivo was performed inASD and control toddlers. Unique to this study was that all toddlerscame from a general naturalistic population screening approach thatallows for the unbiased, prospective recruitment and study of ASD,typically developing (TD) and contrast toddlers as they occur incommunity pediatric clinics. Unbiased data-driven bioinformatics methodswere used to discover functional genomic abnormalities that arecorrelated with brain anatomy at the age of clinical onset in ASD anddistinguish them from TD and contrast toddlers. With this naturalistgeneral population approach, it is also able to test whether somefunctional genomic abnormalities might also provide candidate diagnosticsignatures of risk for ASD at very young ages.

Methods Subjects, Tracking and Clinical Measures

Participants were 215 males ages 1-4 years. 147 toddlers were in aDiscovery sample (N=91 ASD, 56 control) and 73 (N=44 ASD, 29 control) ina Replication sample. Toddlers were recruited via the 1-Year Well-BabyCheck-Up Approach from community pediatric clinics¹⁵ (see Methods inExample 2) that enables a general naturalistic population screeningapproach for prospective study of ASD, typically developing subjects andcontrast patients. In this approach, parents of toddlers completed abroadband developmental screen at their pediatrician's office, andtoddlers were referred, evaluated and tracked over time. This providedan unbiased recruitment of toddlers representing a wide range andvariety of ability and disability. Blood samples for gene expression,DNA analysis and MRI brain scans were collected from a subset ofsubjects at time of referral, regardless of referral reason, and beforefinal diagnostic evaluations. Every subject was evaluated using multipletests including the appropriate module of the Autism DiagnosticObservation Schedule (ADOS)^(16,17) and the Mullen Scales of EarlyLearning¹⁸. Parents were interviewed with the Vineland Adaptive BehaviorScales¹⁹ and a medical history interview. Subjects younger than 3-yearsof age at the time of blood draw were longitudinally diagnostically andpsychometrically re-evaluated every 6-12 months until their 3^(rd)birthday, when a final diagnosis was given. Subjects were divided intotwo study groups: ASD and control. The control group was comprised oftypically developing (TD) and contrast (e.g., language, globaldevelopmental or motor delay) toddlers (Table 4).

TABLE 4 Summary of subject characteristics and clinical informationDiscovery Replication Subjects Characteristics ASD TD Contrast ASD TDContrast Age in years - Mean (SD) 2.3 (0.7) 2.0 (0.9) 1.5 (0.6) 2.3(0.8) 1.6 (0.7)  1.2 (0.2) AD 77 31 TD 41 25 PDD-NOS 10 13 LanguageDelayed ‡ 9 2 Globally Developmentally Delayed 

1 Radiological abnormality 1 1 Premature birth, testing normally 

2 Socially Emotionally Delayed 

1 Drug Exposure 

1 Ethnicity Hispanic 24 5 2 13 3 1 Race Caucasian 44 29 9 23 17 3 Asian4 2 1 2 African-American 1 1 1 1 Mixed 13 4 1 6 3 Indian 1 Unknown 1Subjects Clinical Information ASD TD Contrast ASD TD Contrast MullenScales of Early Learning (T-Scores) - Mean (SD) Visual Reception 39.7(11.0) 59.0 (10.3) 48.1 (9.0)  40.6 (13.6) 51.6 (10.2) 44.3 (4.5) FineMotor 37.3 (12.2) 55.9 (9.1)  55.8 (8.4)  40.1 (16.0) 57.5 (8.5)  55.7(2.9) Receptive Language 29.1 (12.0) 52.4 (8.3)  46.9 (8.5)  31.6 (16.1)50.7 (10.2) 36.7 (4.9) Expressive Language 29.1 (11.4) 53.7 (9.5)  46.3(7.9)  31.4 (16.4) 52.0 (8.6)  41.0 (2.6) ADOS T Social Affect Total,Modules 1 and 2 Communication + Social Interaction Total - Mean (SD)ADOS CoSo/SA Score * 15.0 (3.9)  2.1 (1.7) 0.6 (1.1) 12.8 (4.8)  2.4(2.2)  5.0 (5.0) ADOS RRB Score 4.1 (1.9) 0.3 (0.5) 4.1 (4.7) 2.5 (1.6)0.3 (0.4)  0.7 (1.2) ADOS Total Score 19.1 (4.7)  2.4 (1.9) 2.1 (2.5)15.3 (5.4)  2.6 (2.3)  5.7 (6.0) Early Learning Composite 71.0 (16.2)110.5 (12.4)  98.7 (11.4) 76.1 (21.6) 106.0 (12.9)  89.3 (6.7) Vinelandscores (VABS) † 82.2 (9.4)  101.6 (9.3)  92.4 (7.6)  83.6 (14.1) 100.8(7.3)  95.0 (1.0) ‡ >1 standard deviation below expected values on thelanguage subtests on the Mullen

 >1 standard deviation below expected values on 3 or more of thesubtests of the Mullen and the overall developmental quotient was >1standard deviation below expected values (i.e., <85)

  <than 37 weeks gestation

 Diagnosis of social emotional delay #Z,53 Mother with drugs exposureduring pregnancy * Replication: 32% of ASD population had ADOS T, 48%had ADOS 1, and 20% had ADOS 2 Discovery: 64% of ASD population hadADOST, 31% had ADOS 1, and 5% had ADOS 2 † Adaptive Behavioral ScalesAdaptive Behavior Composite Score

Blood Sample Collection and Processing

Leukocytes were captured using LEUKOLOCK filters (Ambion, Austin, Tex.)from four-to-six ml of blood (see Methods in Example 2) for Discoveryand Replication samples. RNA samples in the Discovery set were tested onthe Illumina Human-HT12_v.4 platform, while the Illumina WG-6 platformwas used for the Replication set. Five low-quality arrays wereidentified and excluded from statistical analyses (see Methods inEXAMPLE 2). Final samples were 87 ASD and 55 control Discovery toddlersand 44 ASD and 29 control Replication toddlers (Table 4).

MRI Scanning and Neuroanatomical Measurement

MRI data were obtained during natural sleep from Discovery toddlers (65ASDs, 38 controls) whose parents consented to scanning. Twelveneuroanatomic measurements were obtained using a semi-automated pipelineintegrating modified features of FSL and BrainVisa (fmrib.ox.ac.uk/fsl/;brainvisa.info), and included total brain volume, left and rightcerebral gray and white matter volumes, left and right cerebral corticalsurface areas, left and right cerebellar gray and white matter volumes,and brainstem volume (See Methods in EXAMPLE 2).

Statistical and Bioinformatic Analyses

Statistical analyses were performed on normalized and filteredexpression data. Effects of age on neuroanatomic measures were removedvia Generalized Additive Models (GAM-R package v1.06.2)²⁰.

Co-expression analysis (WGCNA) was used to identify gene modules acrossall Discovery subjects and within each study group separately (SeeMethods in EXAMPLE 2). WGCNA analysis, Pearson and Spearman correlationswere used to identify associations between gene expression patterns andneuroanatomy across all Discovery toddlers. Gene Significance (geneexpression level to phenotype correlation) and Module Membership (geneconnectivity within each module) were also computed using WGCNA (SeeMethods in Example 2). Class comparison analyses were performed using arandom variance model with 10,000 univariate permutation tests inBRB-Array Tools (linus.nci.nih.gov/BRB-ArrayTools.html). MetaCoresoftware was used for pathway enrichment analyses. Hyper-geometricprobability (Hyp. P) was used to test the significance of Venn analysesversus random gene sets of equal size (See Methods in EXAMPLE 2).Differentially expressed (DE) genes from Discovery toddlers were used toidentify a potential gene expression signature of ASD, Four DE moduleswere selected based on AUC performance in classification of Discoverytoddlers using a logistic regression function (glmnet). CNVision wasused to call copy number variations (CNVs) in misclassified ASD subjectsas previously described.^(2,21)

Results

The majority of Discovery and Replication subjects were of Caucasianorigin. Pearson's Chi-squared test showed no significant difference inrace/ethnicity distribution between ASD and control (Discovery X²=7.98,P=0.1569; Replication X²=7.19, P=0.2065).

Across ASD and control toddlers, age-corrected MRI total brain volume(TBV) measures followed a normal distribution with no statisticallysignificant difference (FIG. 1A, P=0.645), as well as for the othermeasures.

After filtering across all Discovery subjects, 12208 gene probes wereused for downstream analyses.

Different Gene-Networks Underlie Variation of Neuroanatomic Measures inASD and Control Groups

WGCNA Across Combined ASD and Control Groups

Unsupervised co-expression analysis using WGCNA identified 22 modules ofco-expressed genes (see FIG. 5) with eigengene values computed for eachmodule and each ASD and control subject. Of these 22 modules, seven wereconsistently correlated with neuroanatomic measures across all subjects,including TBV, cerebral gray, cerebral white and cerebral corticalsurface area (Table 5, FIG. 6) and displayed statistically significantenrichments (P<0.05, FDR<0.05; FIG. 1B). The greenyellow and grey60 genemodules displayed the strongest correlations with brain and cerebrumvolumes across groups (Table 5) and all seven modules were associatedwith TBV measures. The greenyellow module displayed top enrichment incell cycle functions, while protein folding genes were highest in thegrey60 module (FIG. 1B, Table 8). Seven different gene modules wereinstead associated with diagnosis (see Table 26 in EXAMPLE 2) andMetacore analysis displayed no significant enrichment for the strongestcorrelated modules followed by cell cycle, translation and inflammationgenes (see Table 26 in EXAMPLE 2).

TABLE 5 WGCNA association analysis (Pearson correlation) ofmodule-eigengenes and age-corrected neuroanatomic measures in ASD andcontrol toddlers together ASD/Control CB_GM CB_WM CBLL_GM CBLL_WM MODULE(L/R) (L/R) (L/R) (L/R) BS TBV Hemi_SA GreenYellow −0.32***/−0.33*** −0.3***/−0.28*** ns/ns ns/ns ns −0.31*** −0.29***/−0.3***  Grey60−0.31***/−0.32*** −0.26**/−0.24** ns/ns ns/ns ns −0.3*** −0.26**/−0.27**Cyan 0.21*/0.2*  0.18*/0.17* ns/ns ns/ns ns 0.18* 0.14*/0.15* Turquoise0.18*/0.19* 0.17*/0.17* ns/ns ns/ns ns 0.19* 0.16*/0.16* Yellow −0.2*/−0.21* −0.17*/−0.17* ns/ns ns/ns ns −0.19* −0.18*/−0.18*LightGreen −0.19*/−0.21* −0.18*/−0.17* ns/ns ns/ns ns −0.21*−0.15*/−0.17* MidnightBlue  0.21*/0.22** ns/ns ns/ns ns/ns ns 0.21*0.21*/0.21* Signif. codes relate also to FIG. 1B: p-value ***<0.001; **<0.01; *<0.05 Signif L = Left, R = Right, CB = Cerebrum, CBLL =Cerebellum, GM = Gray Matter, WM = White Matter, TBV = Total BrainVolume, hemi = hemisphere, SA = Surface Area, BS = Brain Stem, ns = notsignificant

In control toddlers, only the cell cycle and protein folding moduleeigengenes (MEs) were strongly correlated with TBV and all cerebralmeasures (Tables 6 and 8). In contrast, ASD toddlers displayedcorrelations with several MEs, with the strongest being cell adhesion,inflammation and cytoskeleton regulation and the weakest being cellcycle, protein folding and transcription (Tables 6 and 8). Unlikecontrol toddlers, cell cycle and protein folding MEs in ASD toddlerswere not significantly correlated with cerebral white matter measures;instead, cerebral white matter volume was strongly correlated with celladhesion and, to a lesser extent, inflammation and cytoskeletonregulation MEs (Table 6). Linear modeling of MEs with TBV variation(from small to big) displayed that cell cycle and protein folding geneshave highest expression in normal small brains, while reduced to neutraleffects are carried out by translation, cell adhesion, cytoskeleton andinflammation genes (FIG. 1C). Conversely, the combinatorial action ofreduced activity of cell cycle and protein folding genes with a gain inexpression of cell adhesion, cytoskeleton and inflammation seems todrive pathological brain enlargement in ASD (FIG. 1C).

TABLE 6 Pearson and Spearman correlations of module-eigengenes andage-corrected neuroanatomic measures in ASD and control toddlersseparately Control CB_GM CB_WM Hemi_SA MODULE (L/R) (L/R) TBV (L/R) TopNetwork Grey60 −0.41**/−0.42** −0.49**/−0.48** −0.47**  −0.4*/−0.42**Protein folding_ER −0.47{circumflex over ( )}{circumflex over( )}/−0.46{circumflex over ( )}{circumflex over ( )} −0.49{circumflexover ( )}{circumflex over ( )}/−0.49{circumflex over ( )}{circumflexover ( )} −0.53{circumflex over ( )}{circumflex over ( )}{circumflexover ( )} −0.4{circumflex over ( )}/−0.41{circumflex over ( )} andcytoplasm GreenYellow −0.43**/−0.44** −0.42**/−0.41*  −0.44**−0.39*/−0.43** Cell cycle_core −0.44{circumflex over ( )}{circumflexover ( )}/−0.45{circumflex over ( )}{circumflex over ( )}−0.37{circumflex over ( )}/−0.36{circumflex over ( )} −0.42{circumflexover ( )}{circumflex over ( )} −0.31{circumflex over( )}/−0.35{circumflex over ( )}  ASD CB_GM CB_WM Hemi_SA MODULE (L/R)(L/R) TBV (L/R) Top Network MidnightBlue 0.35**/0.37** 0.29*/0.26*0.35** 0.33**/0.34** Cell adhesion_integrin- 0.42{circumflex over( )}{circumflex over ( )}{circumflex over ( )}/0.41{circumflex over( )}{circumflex over ( )}{circumflex over ( )} 0.31{circumflex over( )}/0.3{circumflex over ( )}  0.4{circumflex over ( )}{circumflex over( )} 0.42{circumflex over ( )}{circumflex over ( )}{circumflex over( )}/0.41{circumflex over ( )}{circumflex over ( )}{circumflex over ( )}mediated Turquoise 0.29*/0.29* ns/ns 0.29* 0.26*/0.25*Inflammation_interferon 0.39{circumflex over ( )}{circumflex over( )}/0.38{circumflex over ( )}{circumflex over ( )} 0.29{circumflex over( )}/0.29{circumflex over ( )} 0.39{circumflex over ( )}{circumflex over( )} 0.35{circumflex over ( )}{circumflex over ( )}/0.31{circumflex over( )} signaling Cyan 0.31*/0.31* ns/ns 0.27* 0.24*/0.25*Cytoskeleton_regulation 0.26{circumflex over ( )}/0.27{circumflex over( )} 0.25{circumflex over ( )}/ns   0.28{circumflex over ( )} ns/ns andrearrangement Yellow −0.25*/−0.25* ns/ns ns ns/ns Translation_reulation−0.3{circumflex over ( )}/−0.3{circumflex over ( )} ns/ns−0.27{circumflex over ( )} −0.27{circumflex over ( )}/−0.25{circumflexover ( )} of initiation GreenYellow −025*/−026* ns/ns ns ns/ns Cellcycle_core −0.26{circumflex over ( )}/−0.3{circumflex over ( )}  ns/ns−0.27{circumflex over ( )} −0.28{circumflex over ( )}/−0.27{circumflexover ( )} Grey60 −0.25*/−025*  ns/ns ns ns/ns Protein folding_ER−0.29{circumflex over ( )}/−0.29{circumflex over ( )} ns/ns−0.28{circumflex over ( )} −0.28{circumflex over ( )}/−0.3{circumflexover ( )}  and cytoplasm Signif. codes: p-value Pearson ***<0.001;**<0.01; *<0.05; p-value Spearman {circumflex over ( )}{circumflex over( )}{circumflex over ( )}<0.001; {circumflex over ( )}{circumflex over( )}<0.01; {circumflex over ( )}<0.05 Correlations relate to FIG. 1C, D;L = Left, R = Right, CB = Cerebrum, GM = Gray Matter, WM = White Matter,TBV = Total Brain Volume, hemi = hemisphere, SA = Surface Area ns = notsignificantNetwork Patterns Alteration in ASD Vs. Control Groups

Calculation of the Gene Significance (GS) value for each module providesa measure of the impact of co-expressed genes on normal and pathologicbrain size variation. Correlation analysis between GS and intra-modularGene Connectivity (GC) revealed a major rearrangement of activitypatterns across several gene networks (See Tables 21-23 for the geneswith highest GS and GC). Twelve (12) of the 22 modules displayed a shiftin pattern direction (negative to positive or not significant, andvice-versa) suggesting that for each of these 12 modules the impact ofhub-genes on brain size variation was significantly altered in ASDcompared to control (FIG. 7). Importantly, Cell cycle and Proteinfolding hub-genes displayed reduced GS values in ASD toddlers, while asubstantial gain in GS was observed for hub-genes in the cytoskeleton,inflammation, cell adhesion and translation modules (FIG. 2). Similaranalyses, assessing the specificity of a gene to a module (ModuleMembership, MM) in respect to its GS, supported the alterations in geneconnectivity (FIGS. 2A-2C; See Tables 24 & 25 for the genes with highestMM).

WGCNA in ASD and Control Groups Separately

To further test for ASD-specific gene expression relationships to braindevelopment, the same 12208 gene probes were analyzed by WGCNA withineach study group (ASD, control) separately. Of 20 control-basedco-expression modules, only 2 were significantly and strongly correlatedto brain volume and cerebral measures (FIG. 8). As to the aboveacross-groups analysis, these two modules were enriched in cell cycleand protein folding genes and displayed high GS values for normal TBVvariation (FIG. 3; Table 9). Of 22 ASD-based co-expression modules, 11were significantly correlated with one or more neuroanatomic measures(FIG. 9). Unlike control toddlers, these 11 modules had GS valuesconsistent to the across-groups analysis and were enriched in multiplefunctional domains including immune, inflammation, cell adhesion,translation, and development (FIG. 3, Table 10).

DNA-Damage and Mitogenic Gene-Networks are Consistently Dysregulated inASD Vs Control

Class comparison analyses between ASD and control toddlers found 2765unique differentially expressed (DE) genes (see Table 16). Metacoreenrichment displayed significant dysregulation for immune/inflammationresponse, DNA-damage/apoptosis and cell cycle regulation pathways aswell as apoptosis, as the top Metacore process networks (Table 11).Pathway comparison between the Discovery and the Replication datasetsindicated that DNA-damage response and mitogenic signaling were the mostsimilarly and statistically significant dysregulated pathways in bothsamples (FIG. 4A, Table 12). At the gene level, 405 genes were commonlydysregulated and accounted primarily for networks involved in cellnumber regulation (FIG. 4B, Tables 13 and 17).

Venn analysis between the group-based gene modules associated withneuroanatomic measures and the 2765 DE genes, showed that 12.7% (37/290;Hyp. P=0.38) and 27.1% (786/2894; Hyp. P=1.8e-127) of the gene-moduleswere differentially expressed in control and ASD specific modules,respectively.

Key genes in the DNA-damage and mitogenic signaling categories wereCDKs, CREB1, ATM, 14-3-3s, AKT, BCL2, PCNA, STAT1, PI3K, Beta-catenin,Caspases, NUMA1, NFBD1, PP2A, RADs and MAPKs (Tables 18 & 19).

Module-Based Classification Efficiently Distinguishes ASD from ControlToddlers

Co-expression analysis of the 2765 DE genes using WGCNA found 12 genemodules and eigengenes were calculated for each subject and each module(FIG. 10). Four of these module eigengenes were used in theclassification analysis together with subject's age as predictor.Logistic regression of diagnosis with age as predictor produced 1.07odds ratio (P<0.05) and classification without age was 3-4% lessaccurate (data not shown). Of the 405 dysregulated genes in bothDiscovery and Replication subjects, 24.2% (98/405; Hyp. P=2.7e-48) wererepresented in these four modules. Logistic regression with repeated(3×) 10-fold cross-validation and ROC analysis displayed high AUC inboth Discovery (training set) and Replication (independent validationtest set) toddlers with 82.5% and 75% accuracy, respectively (FIG. 4C,D; Table 14). While specificity remained high across the different classcomparisons, accuracy and sensitivity decreased as the samples size wasreduced (FIG. 4D).

Characteristics of Genes in the Classification Signature

Metacore analysis of the four modules classifier displayed significantenrichment in translation and immune/inflammation genes (Table 14).DAPPLE analysis (broadinstitute.org/mpg/dapple)²⁰ of these gene modulesrevealed a statistical enrichment for protein-protein interaction(P<0.001). We next created a classification network based on the geneswith the highest number of interactions. Consistent with the enrichmentfindings, a substantial number of ribosomal and translation genes werepositioned at the center of the network (FIG. 4E). Enrichment analysisof the DAPPLE priority genes confirmed translation initiation as topprocess network (P=4e-18). Moreover, 17.2% of the classifier genes(131/762; Hyp. P=0.046) were located within Autism relevant CNVs(mindspec.org/autdb.html) of size below 1 Mb. This is in line withprevious findings²¹ suggesting CNVs as one potential genetic mechanismof gene expression dysregulation²².

Comparison with recently reported classifiers^(23,24) displayed modestto low overlap in gene content. Twelve (12/55) and eighteen (18/43)reported genes were differentially expressed in the Discovery subjectswith only two and one genes, respectively, were present in ourclassifier (Table 15).

Prediction Performance and Subject Characteristics

Prediction performance of all classified subjects (n=215) was correlatedwith age, diagnostic sub-groups, clinical and brain measures.Misclassified ASD toddlers were significantly younger, and misclassifiedcontrol toddlers were significantly older than their correctlyclassified peers (FIG. 11); no other measure was found to besignificantly different (FIG. 12).

A majority of the subjects were Caucasian, Hispanic or Mixed (58.4%,22.4%, and 12.6% respectively). Of these groups, Mixed and Hispanicsubjects were more accurately classified (97% and 88%), compared toCaucasians (74%). At a 0.5 threshold, 12 of the 14 miss-classified ASDsubjects were genotyped for CNV analyses. A rare CNV of known ASDetiology, CNTNAP2 duplication, was found in only one subject (Table 7).

TABLE 7 CNV analysis of mis-classified ASD subjects CNV location DEL/SubjectID (hg18) Size (bp) DUP Genes involved X3F5T chr6: 169182781-540,264 DUP AK055570, 169723045 BX648586, THBS2, WDR27 X3F5T chr7:147713357- 106,109 DUP CNTNAP2, 147819466 LOC392145 M8K5X chr20:47589174- 22,207 DEL PTGIS 47611381 Y2B4P chr15: 21744675- 34,003 DELIntergenic 21778678 (NDN, AK124131) J3L5W chr1: 231796069- 17,644 DELIntergenic 231813713 (KIAA1804, KCNK1) L5S3Z chr1: 242582713- 18,934 DELC1orf100 242601647 X2H3X chr12: 72374075- 17,705 DUP Intergenic 72391780(TRHDE, BC061638) J3L5W chr14: 19754117- 71,010 DEL OR11H4, OR11H619825127 S3D7F chr5: 12578748- 327,822 DEL AY328033, 12906570 AY330599Z3W7W chr6: 132884089- 22,152 DEL TAAR9 132906241 DEL = heterozygousdeletion, DUP = duplication. Reference genome hg18

TABLE 8 Process Networks (Metacore) enrichment for the seven module withsignificant association with neuroanatomic measures by WGCNA across ASDand control toddlers # Networks pValue Ratio GreenYellow_genelist 1 Cellcycle_Core 9.34E−40 40/115 2 Cell cycle_Mitosis 2.27E−36 44/179 3Cytoskeleton_Spindle 2.05E−30 33/109 microtubules 4 Cell cycle_S phase2.59E−29 36/149 5 Cell cycle_G2-M 9.81E−17 29/206 6 Cell cycle_G1-S1.14E−10 20/163 7 Cytoskeleton_Cytoplasmic 6.53E−07 13/115 microtubules8 DNA damage_DBS 7.23E−07 13/116 repair 9 DNA 1.56E−06 13/124damage_Checkpoint 10 Cell cycle_Meiosis 1.77E−06 12/106 11Proteolysis_Ubiquitin- 1.67E−04 12/166 proteasomal proteolysisCyan_genelist 1 Cytoskeleton_Regulation 2.87E−04  7/183 of cytoskeletonrearrangement 2 Development_Hemopoiesis, 3.79E−04  6/136 Erythropoietinpathway 3 Cell adhesion_Integrin- 7.38E−04  7/214 mediated cell-matrixadhesion 4 Cytoskeleton_Actin 1.47E−03  6/176 filamentsTurquoise_genelist 1 Inflammation_Interferon 1.70E−06 24/110 signaling 2Inflammation_TREM1 3.14E−06 28/145 signaling 3 Inflammation_NK cell4.48E−06 30/164 cytotoxicity 4 Development_Blood 6.73E−06 37/228 vesselmorphogenesis 5 Protein folding_Folding 7.32E−06 24/119 in normalcondition 6 Immune response_TCR 1.52E−05 30/174 signaling 7Inflammation_Amphoterin 6.26E−05 22/118 signaling 8 Chemotaxis 8.33E−0524/137 9 Proliferation_Negative 1.15E−04 29/184 regulation of cellproliferation 10 Protein 1.58E−04 15/69  folding_Response to unfoldedproteins 11 Apoptosis_Death 3.30E−04 21/123 Domain receptors & caspasesin apoptosis MidnightBlue_genelist 1 Cell adhesion_Integrin- 5.41E−1018/214 mediated cell-matrix adhesion 2 Cell adhesion_Platelet- 1.24E−0815/174 endothelium-leucocyte interactions 3 Cell adhesion_Platelet2.20E−07 13/158 aggregation 4 Muscle contraction 4.10E−06 12/173 5 Bloodcoagulation 4.26E−05 8/94 6 Cytoskeleton_Actin 7.00E−04  9/176 filaments7 Cytoskeleton_Regulation 9.26E−04  9/183 of cytoskeleton rearrangement8 Inflammation_Histamine 2.57E−03  9/212 signaling 9Proliferation_Positive 3.40E−03  9/221 regulation cell proliferation 10Development_Skeletal 3.70E−03  7/144 muscle development Grey60_genelist1 Protein folding_ER and 2.34E−08 6/45 cytoplasm 2 Proteinfolding_Response 3.21E−07 6/69 to unfolded proteins 3Apoptosis_Endoplasmic 1.28E−06 6/87 reticulum stress pathway 4 Proteinfolding_Folding 1.20E−04  5/119 in normal condition 5 Immune 4.34E−04 6/243 response_Phagosome in antigen presentation 6 Immune 1.23E−03 5/197 response_Antigen presentation 7 Muscle contraction_Nitric1.72E−03  4/125 oxide signaling in the cardiovascular system 8 Proteinfolding_Protein 1.76E−03 3/58 folding nucleus Yellow_genelist 1Translation_Regulation of 3.13E−08 19/127 initiation 2Translation_Translation 8.83E−07 21/187 in mitochondria 3 Signal9.71E−06 14/106 Transduction_Cholecystokinin signaling 4Inflammation_Neutrophil 1.17E−04 19/219 activation 5 Immune 1.40E−0419/222 response_Phagocytosis 6 Development_Hemopoiesis, 1.61E−04 14/136Erythropoietin pathway 7 Cell adhesion_Integrin 2.75E−04 12/110 priming8 Development_EMT_Regulation 5.07E−04 18/226 of epithelial-to-mesenchymal transition 9 Reproduction_Spermatogenesis, 5.94E−04 18/229motility and copulation 10 Signal transduction_WNT 7.85E−04 15/177signaling 11 Apoptosis_Anti- 8.82E−04 15/179 Apoptosis mediated byexternal signals via MAPK and JAK/STAT LightGreen_genelist 1Inflammation_NK cell 4.90E−16 18/164 cytotoxicity 2 Immune 3.35E−05 9/197 response_Antigen presentation 3 Inflammation_Jak-STAT 1.57E−04 8/188 Pathway 4 Chemotaxis 9.52E−04  6/137 5 Cell adhesion_Leucocyte1.53E−03  7/205 chemotaxis

TABLE 9 Process networks (Metacore) enrichment for each of the 2 modulesassociated with neuroanatomic measures from the WGCNA analysis usingcontrol toddlers # Networks pValue Ratio Magenta_genelist 1 Cellcycle_Core 1.73E−39 37/115 2 Cell cycle_Mitosis 9.92E−36 40/179 3 Cellcycle_S phase 1.62E−30 34/149 4 Cytoskeleton_Spindle microtubules1.49E−29 30/109 5 Cell cycle_G2-M 1.05E−19 29/206 6 Cell cycle_G1-S7.99E−09 16/163 7 DNA damage_Checkpoint 1.02E−07 13/124 8 Cellcycle_Meiosis 8.62E−06 10/106 9 DNA damage_MMR repair 4.69E−05 7/59 10Cytoskeleton_Cytoplasmic microtubules 1.09E−04  9/115MidnightBlue_genelist 1 Protein folding_ER and cytoplasm 6.17E−06 4/45 2Protein folding_Response to unfolded proteins 3.43E−05 4/69 3 Immuneresponse_Phagosome in antigen 4.45E−04  5/243 presentation 4Proteolysis_Ubiquitin-proteasomal proteolysis 1.02E−03  4/166 5Apoptosis_Endoplasmic reticulum stress 1.71E−03 3/87 pathway 6 Immuneresponse_Antigen presentation 1.92E−03  4/197 7 Protein folding_Foldingin normal condition 4.17E−03  3/119 8 Signal transduction_Androgenreceptor 4.90E−03  3/126 nuclear signaling

TABLE 10 Process networks (Metacore) enrichment for each of the 11modules associated with neuroanatomic measures from the WGCNA analysisusing ASD toddlers # Networks pValue Ratio Yellow_genelist 1Inflammation_NK cell 7.67E−08 23/164 cytotoxicity 2 Celladhesion_Leucocyte 1.22E−06 24/205 chemotaxis 3 Chemotaxis 1.29E−0619/137 4 Inflammation_TREM1 1.21E−05 18/145 signaling 5 Immuneresponse_TCR 1.28E−05 20/174 signaling 6 Immune response_BCR 2.14E−0517/137 pathway 7 Inflammation_Innate 2.31E−05 20/181 inflammatoryresponse 8 Immune response_T 2.85E−05 17/140 helper cell differentiation9 Development_Blood 7.41E−05 22/228 vessel morphogenesis 10 Signal1.39E−04 11/75  transduction_ERBB- family signaling Salmon_genelist 1Cell adhesion_Integrin- 7.07E−08 16/214 mediated cell-matrix adhesion 2Muscle contraction 1.89E−07 14/173 3 Cell adhesion_Platelet- 2.03E−0714/174 endothelium-leucocyte interactions 4 Cell adhesion_Platelet2.97E−06 12/158 aggregation 5 Blood coagulation 6.50E−05 8/94 6Cytoskeleton_Actin 1.07E−03  9/176 filaments 7 Proliferation_Positive1.44E−03 10/221 regulation cell proliferation Royalblue_genelist 1Protein 3.21E−07 6/69 folding_Response to unfolded proteins 2Apoptosis_Endoplasmic 2.67E−05 5/87 reticulum stress pathway 3 Proteinfolding_ER and 3.34E−05 4/45 cytoplasm 4 Immune 5.02E−05  7/243response_Phagosome in antigen presentation 5 Immune 1.23E−03  5/197response_Antigen presentation 6 Muscle 1.72E−03  4/125contraction_Nitric oxide signaling in the cardiovascular systemBrown_genelist 1 Development_Blood 2.08E−06 26/228 vessel morphogenesis2 Chemotaxis 1.52E−04 16/137 3 Cell adhesion_Leucocyte 2.87E−04 20/205chemotaxis 4 Immune response_IL-5 3.71E−04 8/44 signalling 5Apoptosis_Death 5.10E−04 14/123 Domain receptors & caspases in apoptosis6 Proliferation_Negative 5.57E−04 18/184 regulation of cellproliferation 7 Inflammation_Neutrophil 6.78E−04 20/219 activation 8Reproduction_Feeding 1.09E−03 19/211 and Neurohormone signaling 9Reproduction_Progesterone 1.22E−03 19/213 signaling 10Development_Hedgehog 1.80E−03 21/254 signaling Purple_genelist 1 Cellcycle_Core 1.03E−43 42/115 2 Cell cycle_Mitosis 1.08E−30 39/179 3 Cellcycle_S phase 5.25E−30 36/149 4 Cytoskeleton_Spindle 3.84E−24 28/109microtubules 5 Cell cycle_G2-M 3.03E−17 29/206 6 Cell cycle_G1-S5.30E−11 20/163 7 DNA 6.06E−06 12/124 damage_Checkpoint 8 DNA damage_DBS1.83E−05 11/116 repair 9 DNA 1.59E−04 7/59 damage_MMR repair 10 DNAdamage_BER- 3.34E−04  9/110 NER repair Grey60_genelist 1Translation_Translation 1.35E−09 11/171 initiation 2Translation_Elongation- 2.66E−04  7/233 Termination Green_genelist 1Inflammation_Interferon 1.18E−31 35/110 signaling 2 Immune 4.28E−1116/84  response_Innate immune response to RNA viral infection 3Inflammation_Inflammasome 2.45E−06 13/118 4 Immune 1.50E−04 14/197response_Antigen presentation 5 Inflammation_IFN- 1.91E−04 10/110 gammasignaling 6 Inflammation_Complement 2.35E−04 8/73 system 7 Chemotaxis2.75E−04 11/137 Black_genelist 1 Immune 2.32E−05 11/174 response_TCRsignaling 2 Translation_Regulation 3.29E−04  8/127 of initiation

TABLE 11 Process Networks (Metacore) enrichment of the Discovery DEgenes # Networks pValue Ratio # Map folders pValue Ratio 1Apoptosis_Apoptotic 4.00E−07 43/159 1 Immune system 6.72E−24 169/1000nucleus response 2 Apoptosis_Death Domain 3.87E−06 34/123 2 Inflammatory2.80E−14 122/775  receptors & caspases in response apoptosis 3 Immune1.09E−05 54/243 3 DNA-damage 1.66E−13 71/354 response_Phagosome inresponse antigen presentation 4 Immune 1.58E−05 50/222 4 Cell cycle andits 1.06E−12 89/516 response_Phagocytosis regulation 5 Immuneresponse_TCR 6.83E−05 40/174 5 Apoptosis 3.71E−12 135/953  signaling 6Translation_Translation 1.01E−04 39/171 6 Cell differentiation 5.52E−10127/940  initiation 7 Inflammation_Interferon 1.35E−04 28/110 7 Tissueremodeling 1.11E−09 86/557 signaling and wound repair 8Apoptosis_Anti-apoptosis 1.59E−04 28/111 8 Protein synthesis 2.93E−0956/306 mediated by external signals via NF-kB 9 Cell adhesion_Leucocyte1.67E−04 44/205 9 Vascular 1.88E−08 81/543 chemotaxis development(Angiogenesis) 10 Inflammation_IFN- 7.96E−04 26/110 10 Cystic fibrosis3.70E−08 90/636 gamma signaling disease 11 Transcription_mRNA 1.07E−0334/160 11 Calcium signaling 1.95E−07 70/469 processing 12 Signal1.09E−03 33/154 12 Protein degradation 2.55E−07 47/269Transduction_TGF-beta, GDF and Activin signaling 13 Cell cycle_Mitosis1.13E−03 37/179 13 Mitogenic signaling 7.87E−07 78/562 14Cytoskeleton_Actin 1.59E−03 36/176 14 Obesity 1.60E−04 33/211 filaments15 Cell adhesion_Platelet 1.71E−03 33/158 15 Myogenesis 1.74E−04 19/95 aggregation regulation 16 Reproduction_Progesterone 2.63E−03 41/213 16Transcription 4.46E−04 15/71  signaling regulation 17Proteolysis_Proteolysis in 2.64E−03 27/125 17 Hypoxia response 4.64E−04Nov-43 cell cycle and apoptosis regulation 18 Reproduction_FSH-beta4.07E−03 32/160 18 Hematopoiesis 2.30E−03 40/313 signaling pathway 19Cell cycle_G2-M 4.45E−03 39/206 19 Cardiac 4.62E−03 31/236 Hypertrophy20 Signal 4.99E−03 23/106 20 Blood clotting 9.88E−03 34/279Transduction_Cholecystokinin signaling 21 Cytoskeleton_Regulation5.79E−03 35/183 of cytoskeleton rearrangement 22 Development_Regulation6.11E−03 41/223 of angiogenesis 23 Development_Melanocyte 6.80E−0313/50  development and pigmentation 24 Inflammation_IL-4 6.98E−03 24/115signaling 25 Proteolysis_Ubiquitin- 7.21E−03 32/166 proteasomalproteolysis 26 Inflammation_Neutrophil 7.52E−03 40/219 activation

TABLE 12 Pathway comparison between discovery and replication datasetsMap −log err(−log # folders (pValue) pValue (pValue)) Ratio 1 DNA-damage12.6825635 2.08E−13 0.007 115/354  response 12.87386859 1.34E−13 2Mitogenic 5.963371105 1.09E−06 0.01 131/564  signaling 6.0843363968.24E−07 3 Hematopoiesis 2.603451962 2.49E−03 0.023 66/313 2.726073221.88E−03 4 Cardiac 2.307858391 4.92E−03 0.024 50/236 Hypertrophy2.422163659 3.78E−03 5 Retinoid 0.201280315 6.29E−01 0.037 13/105signaling 0.216596719 6.07E−01 6 Androgen 0.439615077 3.63E−01 0.04632/224 signaling 0.401100113 3.97E−01 7 Lipid 0.300856313 5.00E−01 0.05351/389 Biosynthesis 0.334325219 4.63E−01 and regulation 8 Transcription3.33003263 4.68E−04 0.059 25/71  regulation 2.960189446 1.10E−03 9Neuro- 0.416914634 3.83E−01 0.134 94/720 transmission 0.3182162344.81E−01 10 Cystic fibrosis 7.338376591 4.59E−08 0.184 142/636  disease5.052909536 8.85E−06 11 Vascular 8.300162274 5.01E−09 0.234 123/553 development 5.156580335 6.97E−06 (Angiogenesis) 12 Cell cycle and11.86264589 1.37E−12 0.235 158/516  its regulation 19.1562068 6.98E−2013 Vasodilation 0.991825816 1.02E−01 0.256 65/402 1.67325462 2.12E−02 14Apoptosis 11.57348874 2.67E−12 0.27 206/964  6.646276062 2.26E−07 15Cell 9.182897596 6.56E−10 0.276 197/958  differentiation 5.2073282116.20E−06 16 Vasocon- 0.424927674 3.76E−01 0.297 51/357 striction0.783570169 1.65E−01 17 Myogenesis 3.735654493 1.84E−04 0.339 28/95 regulation 1.845576027 1.43E−02 18 Visual 0.187353984 6.50E−01 0.39915/133 perception 0.080451241 8.31E−01 19 Oxidative 0.351542406 4.45E−010.412 93/697 stress 0.844967771 1.43E−01 regulation 20 Calcium6.633763876 2.32E−07 0.471 101/469  signaling 2.38341947 4.14E−03 21Tissue 9.020315 9.54E−10 0.489 126/562  remodeling 3.098542 7.97E−04 andwound repair 22 Nicotine 0.160836 6.91E−01 0.507 21/229 action 0.0525668.86E−01 23 Estrogen 0.32413 4.74E−01 0.508 43/287 signaling 0.9939621.01E−01 24 Diuresis 0.092696 8.08E−01 0.519 13/139 0.292771 5.10E−01 25Protein 6.535809 2.91E−07 0.547 65/269 degradation 1.91364 1.22E−02 26Protein 8.461803 3.45E−09 0.564 79/306 synthesis 2.360215 4.36E−03 27Obesity 3.377786 4.19E−04 0.566 41/203 0.935168 1.16E−01 28 Inflammatory13.93779 1.15E−14 0.574 179/790  response 3.775208 1.68E−04 29Nucleotide 0.039482 9.13E−01 0.601 42/401 metabolism 0.009839 9.78E−01and its regulation 30 Hypoxia 3.317314 4.82E−04 0.632 13/43  response0.748119 1.79E−01 regulation 31 Nuclear 0.110138 7.76E−01 0.648 75/595receptor 0.515415 3.05E−01 signaling 32 Energy 0.311669 4.88E−01 0.682133/927  metabolism 1.649752 2.24E−02 and its regulation 33 Blood1.977572 1.05E−02 0.727 48/279 clotting 0.313006 4.86E−01 34 Immune23.03588 9.21E−24 0.765 233/1007 system 3.06118 8.69E−04 response 35Spermato- 1.540909 2.88E−02 0.88 6/22 genesis 0.098378 7.97E−01 36Phospholipid 0.004935 9.89E−01 0.912 17/205 Metabolism 0.108128 7.80E−0137 Cholesterol 4.34E−05 1.00E+00 0.995 38/471 and bile 0.015158 9.66E−01acid homeostasis 38 Aminoacid 0 1.00E+00 1 69/944 metabolism 4.34E−051.00E+00 and its regulation 39 Vitamin 0 1.00E+00 1 34/688 and 01.00E+00 cofactor metabolism and its regulation

TABLE 13 Commonly dysregulated pathways in discovery and replicationtoddlers # Map folders -log(pValue) pValue Ratio 1 DNA-damage response6.598599 2.52E−07 20/354 2 Cell cycle and its regulation 5.1600196.92E−06 22/516 3 Apoptosis 4.645892 2.26E−05 31/964 4 Vasculardevelopment 4.177832 6.64E−05 21/553 (Angiogenesis) 5 Obesity 2.4462383.58E−03  9/203 6 Immune system response 2.430275 3.71E−03  26/1007 7Cell differentiation 2.406936 3.92E−03 25/958 8 Tissue remodeling and2.353302 4.43E−03 17/562 wound repair 9 Cardiac Hypertrophy 2.0253049.43E−03  9/236 10 Mitogenic signaling 1.977159 1.05E−02 16/564

TABLE 14 Process Networks and Pathway Maps (Metacore) enrichment of thefour genes modules used as classifier # -log(pValue) pValue RatioNetworks 1 Translation_Translation initiation 9.130416292 7.41E−1027/171 2 Inflammation_IFN-gamma signaling 5.798876103 1.59E−06 17/110 3Translation_Elongation-Termination 5.696587929 2.01E−06 26/233 4Translation_Elongation-Termination_test 5.696587929 2.01E−06 26/233 5Cell adhesion_Platelet aggregation 5.322575562 4.76E−06 20/158 6 Immuneresponse_Phagocytosis 5.056653902 8.78E−06 24/222 7 Celladhesion_Leucocyte chemotaxis 4.141102043 7.23E−05 21/205 8 SignalTransduction_Cholecystokinin signaling 4.088735829 8.15E−05 14/106 9Immune response_TCR signaling 3.677367288 2.10E−04 18/174 10 Cellcycle_G1-S Growth factor regulation 3.513144645 3.07E−04 19/195 Mapfolders 1 Immune system response 11.64859025 2.25E−12 66/1007 2 Proteinsynthesis 9.648590248 2.25E−10 31/306 3 Tissue remodeling and woundrepair 8.799423073 1.59E−09 42/562 4 Inflammatory response 7.5584619612.76E−08 49/790 5 Vascular development (Angiogenesis) 7.4516105823.54E−08 39/553 6 Calcium signaling 7.297741837 5.04E−08 35/469 7 Celldifferentiation 6.130416292 7.41E−07 52/958 8 Mitogenic signaling5.813326133 1.54E−06 36/564 9 Hypoxia response regulation 5.6846595232.07E−06 9/43 10 Cystic fibrosis disease 5.016779785 9.62E−06 37/636

TABLE 15 Kong et al., signature genes  ADAM10 AHNAK CREBBP overlappingthe DE genes IFNAR2KBTBD11 KIAA0247 from the discovery subjectsKIDINS220 MGAT4A PTPRE ROCK1 SERINC3 ZNF12 Glatt et al., signature genesANKRD22 ANXA3 APOBEC3G overlapping the DE genes  C11orf75 C3orf38 CARD17from the discovery subjects FCGR1A FCGR1B  GBP1 GBP5 GCH1 IFI16 IL1RN LOC644852 PARP9 PLSCR1TAP1 VWF Kong et al., signature genesAHNAK CREBBP KBTBD11 overlapping with the   KIAA0247 KIDINS220 four genemodules classifier ROCK1 Glatt et al., signature genes VWF overlappingwith the four gene modules classifier

TABLE 16 Gene Listing of Unique Differentially Expressed (DE) GenesSEPT6 SEPT7 SEPT9 SEPT11 SEPT14 AAK1 ABAT ABCB1 ABCC3 ABCG1 ABHD13ABHD14A ABHD14B ABHD15 ABHD7 ABL1 ACAA1 ACACB ACAD11 ACAD8 ACADVL ACDACER2 ACOT4 ACOT9 ACSL1 ACSM3 ACTA2 ACTR2 ACYP2 ADAM10 ADAM17 ADAM19ADAM28 ADARB1 ADCY7 ADI1 ADNP ADNP2 ADPRHL2 AES AFF1 AGAP8 AGER AGPAT3AHCTF1 AHCY AHI1 AHNAK AIF1 AIM2 AIP AIRE AK2P2 AK3 AK5 AKAP7 AKR1C3AKR1D1 AKR7A3 AKT1 AKTIP ALDH5A1 ALG10B ALG13 ALKBH7 ALKBH8 ALOX12 ALOX5ALPK1 ALPP ALS2CR14 AMOTL2 AMY1A AMY1B AMY2B AMZ2 ANGPT1 ANKRD12 ANKRD22ANKRD28 ANKRD36 ANKRD41 ANKRD44 ANP32A ANP32C ANXA1 ANXA11 ANXA2 ANXA2P1ANXA2P3 ANXA3 ANXA4 AP1B1 AP1G1 AP1G2 AP1M2 AP1S1 AP2A1 AP2M1 AP2S1APBA2 API5 APOA1BP APOBEC3G APOL2 APPL2 AQP12A ARAP2 ARAP3 ARF1 ARF6ARFGAP3 ARHGAP10 ARHGAP17 ARHGAP21 ARHGAP25 ARHGAP27 ARHGAP30 ARHGAP9ARHGDIA ARHGEF18 ARHGEF3 ARID1A ARID2 ARID4A ARID4B ARL17B ARL4C ARL5AARL6IP1 ARMC5 ARRB2 ARRDC2 ASAP1 ASB1 ASCC3 ASMTL ATG10 ATG2A ATG3 ATG4CATHL1 ATM ATN1 ATP1B1 ATP2B4 ATP5A1 ATP5D ATP5E ATP5O ATP6V0C ATP6V1C1ATPGD1 ATR ATRX AXIN1 AZIN1 B3GALT6 B3GAT1 BAG4 BATF BATF2 BAZ1B BBXBCAP31 BCAS2 BCKDHA BCL11B BCL2 BCL2A1 BCL2L11 BCL6 BCL9 BCL9L BCORBCORL1 BCR BEGAIN BEX1 BIN2 BIRC3 BIVM BLNK BMF BMP8B BMPR2 BPGM BRD3BRD7P2 BRDG1 BRPF3 BRWD1 BRWD2 BST1 BTBD2 BTF3 BTK BUB3 C10orf104C10orf35 C10orf4 C10orf47 C10orf58 C10orf76 C11orf1 C11orf2 C11orf46C11orf63 C11orf73 C11orf75 C11orf82 C12orf29 C12orf30 C12orf32 C12orf65C13orf15 C13orf18 C14orf102 C14orf11 C14orf135 C14orf138 C14orf19C14orf28 C14orf32 C14orf43 C14orf82 C15orf21 C15orf26 C15orf52 C15orf57C16orf30 C16orf53 C16orf57 C16orf68 C16orf69 C17orf41 C17orf45 C17orf87C18orf10 C18orf32 C19orf12 C19orf2 C19orf25 C19orf39 C19orf53 C19orf56C19orf59 C19orf6 C19orf60 C1D C1GALT1 C1GALT1C1 C1orf110 C1orf151C1orf166 C1orf186 C1orf43 C1orf63 C1orf71 C1orf77 C1orf85 C1orf86 C1orf9C1QB C1QBP C20orf100 C20orf108 C20orf11 C20orf196 C20orf199 C20orf29C20orf30 C20orf4 C20orf55 C20orf94 C21orf33 C21orf66 C21orf7 C22orf29C22orf32 C22orf34 C2orf15 C2orf21 C2orf69 C2orf89 C3orf10 C3orf17C3orf34 C3orf38 C3orf58 C3orf63 C4orf16 C4orf32 C4orf34 C4orf43 C5orf20C5orf4 C5orf41 C5orf53 C6orf150 C6orf160 C6orf170 C6orf204 C6orf211C6orf225 C6orf48 C6orf62 C7orf11 C7orf28A C7orf41 C7orf70 C8orf33C9orf109 C9orf127 C9orf130 C9orf30 C9orf5 C9orf72 C9orf80 C9orf85 CA2CA5B CABC1 CABIN1 CABP5 CACYBP CALM1 CALML4 CAMK1D CAMSAP1L1 CANX CAPS2CAPZA1 CARD14 CARD16 CARD17 CARS2 CASP1 CASP2 CASP4 CASP5 CASP8 CASTCBFB CBL CBR3 CBS CBWD1 CBWD3 CBX7 CCAR1 CCDC115 CCDC117 CCDC147 CCDC15CCDC16 CCDC23 CCDC25 CCDC28B CCDC50 CCDC59 CCDC6 CCDC65 CCDC72 CCDC82CCDC86 CCDC90A CCDC90B CCDC91 CCDC97 CCL2 CCL8 CCNG1 CCNK CCNL1 CCNYCCNYL1 CCR4 CCRL2 CCS CCT3 CCT6P1 CD164 CD1E CD27 CD274 CD300LB CD320CD3D CD3E CD3G CD40LG CD47 CD6 CD74 CD79B CD84 CD97 CD99 CDAN1 CDC14ACDC14B CDC25B CDC2L2 CDC2L6 CDC42SE2 CDK2AP2 CDK5RAP3 CEACAM1 CEACAM4CECR1 CENPL CENPV CENTB2 CENTD1 CENTG2 CENTG3 CEP27 CEP350 CEP63 CEP68CEPT1 CERK CETN3 CHCHD2 CHD3 CHD8 CHES1 CHM CHML CHMP2A CHMP5 CHORDC1CHP CHPF2 CICK0721Q.1 CIR1 CITED4 CKAP5 CKS2 CLASP1 CLEC10A CLEC11ACLEC12A CLEC12B CLEC4A CLEC4D CLEC7A CLIC4 CLIP1 CLIP2 CLIP3 CLK1 CLK3CLN8 CLSTN1 CMIP CMPK1 CMTM3 CMTM4 CMTM7 CNIH4 CNN3 CNNM3 CNOT1 CNOT7COBRA1 COL24A1 COMMD7 COMMD8 COPS2 COX11 COX7A2L CPEB3 CPNE1 CR1 CRBNCREBBP CREM CRIP1 CRIP2 CRIPT CROP CRY2 CRYZL1 CS CSDE1 CSE1L CSF2RBCSNK1A1L CSNK1E CTAGE6 CTDP1 CTDSP1 CTDSPL CTNNB1 CTRL CTSB CTSC CTSFCTSL1 CTTN CUGBP2 CUTA CUTC CUTL1 CWC22 CXCL5 CXCR3 CXCR6 CXCR7 CXorf12CXorf20 CXorf21 CXorf57 CYB561D1 CYB5R1 CYCSL1 CYCSP52 CYFIP2 CYLD CYLN2CYP20A1 D4S234E DAB2 DACH1 DAP DAP3 DAPK2 DAPP1 DBI DBP DBT DCAF16 DCAF7DCK DCLRE1C DCTN1 DCTN6 DCXR DDHD1 DDHD2 DDIT4 DDX27 DDX3X DDX3Y DDX41DDX46 DDX58 DDX59 DDX60 DDX60L DECR2 DEDD DENND2D DERL2 DFFA DGCR8 DGKDDHPS DHRS3 DHRS7 DHX34 DHX9 DIAPH1 DIP2B DKFZp434K191 DKFZp686I15217DKFZp761P0423 DLEU1 DLEU2 DLEU2L DLGAP4 DMWD DMXL1 DMXL2 DNAJB14 DNAJB2DNAJC25-GNG10 DNAJC30 DNAJC7 DNHD2 DNHL1 DNM3 DNTT DNTTIP2 DOPEY2 DPEP2DPM3 DPP3 DRD4 DSC2 DSTN DTWD1 DTX3L DULLARD DUSP14 DUSP22 DYNLT3 DYRK2ECHDC1 ECT2 EDAR EDC3 EEF1A1 EEF1B2 EEF1G EEF2 EEF2K EFCAB2 EIF2AK1EIF2AK4 EIF2C2 EIF2S3 EIF3D EIF3F EIF3G EIF3H EIF3K EIF3L EIF4B EIF4EEIF5A ELA1 ELF2 ELMO1 ELOVL4 ENDOD1 ENO2 ENO3 EP300 EP400 EPAS1 EPB41EPB49 EPHA1 EPHA10 EPHA4 EPN2 EPOR EPSTI1 ERGIC1 ERMN ERMP1 ERVWE1 ESF1ESYT1 ETFB ETNK1 EVI2B EWSR1 EXOC8 FABP5 FABP5L3 FAHD1 FAIM3 FAM101BFAM102A FAM107B FAM108B1 FAM10A4 FAM116A FAM119A FAM120A FAM122A FAM125BFAM126B FAM134A FAM134B FAM13A FAM153B FAM173A FAM195B FAM19A2 FAM26FFAM3A FAM40B FAM62B FAM65B FAM72D FAM73A FAM84B FAM91A2 FANCL FAS FASTKFBLN1 FBLN2 FBP1 FBXL11 FBXL3 FBXO21 FBXO3 FBXO31 FBXO32 FBXO38 FBXO44FBXO5 FBXO6 FCER1A FCGBP FCGR1A FCGR1B FCGR1C FCGR2B FCGR2C FCGR3A FCRL3FERMT3 FEZ1 FEZ2 FFAR2 FHL3 FICD FKBP14 FKBP1A FKBP1P1 FKRP FKTNFLJ10088 FLJ10916 FLJ12078 FLJ13611 FLJ20444 FLJ25363 FLJ34047 FLJ37396FLJ39639 FLJ42627 FLJ45256 FLT3LG FNBP1 FNIP2 FOXJ2 FOXK1 FOXO1 FOXP1FTHL11 FTHL16 FTHL2 FTHL3 FTHL8 FTO FTSJ1 FUT6 FXYD5 FYN FYTTD1 FZD7GABARAPL2 GABBR1 GALNT3 GALNT7 GALT GAR1 GATAD2A GATAD2B GATS GBA GBP1GBP2 GBP3 GBP5 GBP6 GCC2 GCET2 GCH1 GDI1 GDPD1 GDPD5 GEMIN4 GFI1B GIMAP7GIPC1 GIYD2 GK GKAP1 GLG1 GLRX5 GLTSCR1 GLTSCR2 GMCL1 GMPPB GNAI2 GNG10GNG5 GNG7 GNL3L GNPDA2 GNPTAB GOLGA3 GOLGA8B GOLPH3L GOLPH4 GOT2 GP1BAGPAM GPBP1L1 GPN1 GPN3 GPR1 GPR128 GPR141 GPR180 GPR65 GPR68 GPR84 GPR97GPSM3 GPX4 GRAP2 GRASP GRB14 GRN GRPEL2 GRWD1 GSDM1 GSDMB GSTM1 GSTM2GSTM3 GSTM4 GSTTP2 GTF2IRD2B GTF3A GTF3C6 GTPBP8 GUCY1A3 GUSBL1 GVIN1HABP4 HCCA2 HCCS HCFC1 HCFC1R1 HCLS1 HCST HEATR3 HEBP1 HECTD3 HELZ HEMGNHERC1 HERC2 HERPUD2 HEXDC HGD HHEX HIAT1 HIBCH HIGD2A HINT3 HIP1R HIPK2HIST1H2AD HIST1H2AE HIST2H2AB HK1 HLA-C HLA-DRB4 HLA-DRB6 HLA-H HM13HMBOX1 HMGB1 HMGN3 HN1 HNRNPA1L2 HNRNPK HNRNPU HNRPC HNRPH1 HNRPH3 HNRPKHNRPUL1 HOMER2 HOOK1 HOOK3 HORMAD1 HOXC4 HOXC6 HPCAL4 HPSE HRSP12 HSCBHSH2D HSP90AB4P HSPA13 HSPA1L HSPA9 HSPB1 HSPCAL3 HVCN1 HYAL3 HYALP1HYOU1 ICA1 ICK IDH2 IDI1 IFFO2 IFI16 IFI27 IFI44 IFI44L IFI6 IFIT3IFITM4P IFNAR2 IFT20 IGF2BP2 IGF2BP3 IGF2R IGFL3 IKZF1 IL10 IL10RBIL18RAP IL19 IL1RN IL23A IL25 IL27 IL27RA IL4I1 IL6ST IL7R ILF3 ILKILVBL IMMP1L IMPA1 IMPA2 INADL ING2 ING3 INPP4B INSM1 INTS1 INTU IP6K1IP6K2 IPO13 IQCB1 IQGAP2 IRAK1 IRF2 IRF5 IRF7 IRF9 IRS2 IRX3 ISCA1 ISG15ISG20L2 ITFG1 ITGAL ITGAX ITGB1BP1 ITGB5 ITM2B ITPKB JAM3 JARID1A JARID2JUP KATNAL1 KBTBD11 KCNA3 KCNG1 KCNH7 KCTD12 KCTD7 KDM1B KDM5B KDM6BKHDRBS1 KHNYN KIAA0040 KIAA0182 KIAA0247 KIAA0319L KIAA0355 KIAA0408KIAA0776 KIAA1026 KIAA1033 KIAA1147 KIAA1279 KIAA1324 KIAA1430 KIAA1545KIAA1704 KIAA1715 KIAA1737 KIAA1881 KIAA2026 KIDINS220 KIF13B KIF21BKIF22 KIF2A KIT KLF12 KLF5 KLF6 KLF9 KLHL20 KLHL24 KLHL28 KLRB1 KLRG1KPNA2 KPNA6 KRCC1 KREMEN1 KREMEN2 KRT40 KRT73 KRT8P9 KRTAP19-6 KTELC1LACTB LAPTM4A LAPTM4B LAPTM5 LARGE LARP1 LARP1B LASP1 LASS6 LAX1 LCKLCLAT1 LCMT2 LDHA LDHB LDLRAP1 LDOC1L LEF1 LEP LEPROT LFNG LGALS3LGALS3BP LGALS8 LGALS9 LGALS9B LGMN LGSN LHFPL2 LIAS LIG4 LILRA1 LILRA3LILRA6 LILRB1 LIMA1 LIMK2 LIMS1 LIN7C LLPH LMF2 LMNB1 LMNB2 LMTK3LOC100124692 LOC100127893 LOC100127894 LOC100127922 LOC100127975LOC100127993 LOC100128060 LOC100128062 LOC100128252 LOC100128269LOC100128274 LOC100128291 LOC100128410 LOC100128460 LOC100128485LOC100128498 LOC100128516 LOC100128525 LOC100128533 LOC100128548LOC100128627 LOC100128729 LOC100128731 LOC100128908 LOC100128994LOC100129055 LOC100129067 LOC100129094 LOC100129139 LOC100129201LOC100129243 LOC100129267 LOC100129424 LOC100129426 LOC100129441LOC100129445 LOC100129466 LOC100129502 LOC100129543 LOC100129608LOC100129637 LOC100129645 LOC100129681 LOC100129686 LOC100129934LOC100129952 LOC100129960 LOC100129982 LOC100130000 LOC100130053LOC100130070 LOC100130154 LOC100130171 LOC100130255 LOC100130276LOC100130289 LOC100130332 LOC100130520 LOC100130550 LOC100130561LOC100130562 LOC100130598 LOC100130624 LOC100130707 LOC100130715LOC100130764 LOC100130769 LOC100130892 LOC100130932 LOC100130980LOC100131076 LOC100131096 LOC100131253 LOC100131349 LOC100131452LOC100131526 LOC100131572 LOC100131662 LOC100131672 LOC100131675LOC100131713 LOC100131718 LOC100131810 LOC100131835 LOC100131850LOC100131866 LOC100131989 LOC100132037 LOC100132086 LOC100132199LOC100132288 LOC100132323 LOC100132395 LOC100132425 LOC100132444LOC100132493 LOC100132499 LOC100132510 LOC100132521 LOC100132526LOC100132547 LOC100132652 LOC100132707 LOC100132717 LOC100132724LOC100132728 LOC100132742 LOC100132761 LOC100132797 LOC100132804LOC100132888 LOC100132901 LOC100132920 LOC100133034 LOC100133077LOC100133080 LOC100133129 LOC100133163 LOC100133177 LOC100133220LOC100133298 LOC100133329 LOC100133398 LOC100133692 LOC100133697LOC100133760 LOC100133770 LOC100133803 LOC100133875 LOC100134053LOC100134159 LOC100134172 LOC100134241 LOC100134291 LOC100134537LOC100134624 LOC100134688 LOC100134868 LOC100170939 LOC123688 LOC127295LOC130773 LOC146053 LOC147727 LOC147804 LOC163233 LOC196752 LOC197135LOC202134 LOC202227 LOC253039 LOC255809 LOC25845 LOC283267 LOC283412LOC283874 LOC283953 LOC284672 LOC286016 LOC286444 LOC338799 LOC339192LOC339352 LOC339799 LOC339843 LOC345041 LOC345645 LOC347292 LOC374443LOC387791 LOC387820 LOC387841 LOC387934 LOC388122 LOC388339 LOC388556LOC388564 LOC388955 LOC389053 LOC389168 LOC389286 LOC389322 LOC389342LOC389386 LOC389404 LOC389765 LOC389816 LOC390183 LOC390345 LOC390414LOC390530 LOC390578 LOC390735 LOC390876 LOC391045 LOC391169 LOC391334LOC391655 LOC391670 LOC391769 LOC391825 LOC391833 LOC392288 LOC392501LOC399881 LOC399988 LOC400061 LOC400389 LOC400446 LOC400455 LOC400464LOC400652 LOC400750 LOC400759 LOC400836 LOC400948 LOC400963 LOC401076LOC401252 LOC401537 LOC401623 LOC401717 LOC401817 LOC401845 LOC402057LOC402112 LOC402221 LOC402562 LOC402677 LOC402694 LOC439949 LOC439992LOC440055 LOC440093 LOC440157 LOC440280 LOC440396 LOC440525 LOC440563LOC440595 LOC440737 LOC440776 LOC440926 LOC440927 LOC441013 LOC441032LOC441154 LOC441155 LOC441246 LOC441642 LOC441907 LOC441956 LOC442064LOC442153 LOC442181 LOC442232 LOC442270 LOC442319 LOC442517 LOC442582LOC552889 LOC641727 LOC641746 LOC641848 LOC641849 LOC641989 LOC641992LOC642017 LOC642031 LOC642033 LOC642035 LOC642073 LOC642076 LOC642083LOC642118 LOC642120 LOC642178 LOC642222 LOC642236 LOC642250 LOC642299LOC642357 LOC642393 LOC642443 LOC642458 LOC642502 LOC642567 LOC642585LOC642738 LOC642741 LOC642755 LOC642909 LOC642954 LOC642974 LOC643007LOC643015 LOC643031 LOC643187 LOC643272 LOC643384 LOC643387 LOC643424LOC643433 LOC643531 LOC643534 LOC643550 LOC643668 LOC643680 LOC643779LOC643870 LOC643882 LOC643896 LOC643960 LOC643980 LOC643997 LOC644029LOC644037 LOC644063 LOC644094 LOC644101 LOC644131 LOC644315 LOC644330LOC644380 LOC644464 LOC644482 LOC644496 LOC644577 LOC644642 LOC644655LOC644745 LOC644774 LOC644799 LOC644852 LOC644877 LOC644931 LOC644964LOC645018 LOC645052 LOC645086 LOC645173 LOC645233 LOC645236 LOC645251LOC645351 LOC645452 LOC645489 LOC645515 LOC645630 LOC645691 LOC645693LOC645715 LOC645737 LOC645762 LOC645944 LOC645968 LOC646034 LOC646044LOC646197 LOC646294 LOC646491 LOC646527 LOC646531 LOC646630 LOC646672LOC646688 LOC646766 LOC646784 LOC646785 LOC646808 LOC646821 LOC646836LOC646841 LOC646897 LOC646900 LOC646909 LOC646942 LOC646949 LOC646956LOC646966 LOC646996 LOC647030 LOC647037 LOC647074 LOC647086 LOC647195LOC647276 LOC647460 LOC647654 LOC647908 LOC648059 LOC648283 LOC648343LOC648509 LOC648526 LOC648638 LOC648705 LOC648733 LOC648740 LOC648749LOC648822 LOC648863 LOC648907 LOC648921 LOC648980 LOC648984 LOC649088LOC649150 LOC649209 LOC649214 LOC649260 LOC649330 LOC649447 LOC649456LOC649801 LOC649821 LOC649839 LOC649873 LOC650321 LOC650638 LOC650737LOC650898 LOC651064 LOC651198 LOC651316 LOC651738 LOC651816 LOC651919LOC652113 LOC652274 LOC652750 LOC652755 LOC652837 LOC653056 LOC653080LOC653086 LOC653105 LOC653115 LOC653157 LOC653162 LOC653316 LOC653324LOC653375 LOC653450 LOC653486 LOC653489 LOC653496 LOC653559 LOC653596LOC653737 LOC653829 LOC653884 LOC653888 LOC653994 LOC654074 LOC654096LOC654121 LOC654346 LOC654350 LOC727762 LOC727821 LOC727848 LOC727962LOC727970 LOC728002 LOC728026 LOC728031 LOC728060 LOC728093 LOC728105LOC728115 LOC728128 LOC728170 LOC728179 LOC728207 LOC728310 LOC728416LOC728428 LOC728457 LOC728499 LOC728519 LOC728576 LOC728602 LOC728608LOC728650 LOC728661 LOC728666 LOC728715 LOC728744 LOC728748 LOC728755LOC728820 LOC728908 LOC728953 LOC728973 LOC729143 LOC729196 LOC729200LOC729236 LOC729255 LOC729279 LOC729342 LOC729366 LOC729369 LOC729397LOC729402 LOC729409 LOC729423 LOC729505 LOC729510 LOC729513 LOC729519LOC729645 LOC729652 LOC729677 LOC729679 LOC729683 LOC729686 LOC729687LOC729692 LOC729739 LOC729760 LOC729764 LOC729779 LOC729789 LOC729798LOC729806 LOC729843 LOC729898 LOC729985 LOC730029 LOC730052 LOC730060LOC730187 LOC730202 LOC730246 LOC730281 LOC730316 LOC730324 LOC730382LOC730432 LOC730534 LOC730746 LOC730924 LOC730990 LOC730993 LOC731096LOC731308 LOC731314 LOC731365 LOC731751 LOC731789 LOC732229 LOC732360LOC92017 LOC92249 LOC92755 LPAR5 LPHN1 LPIN2 LRBA LRFN3 LRIG1 LRPAP1LRRC14 LRRC16A LRRC26 LRRC40 LRRK2 LSM5 LSP1 LTB4R LUZP1 LYAR LYPLA1LYRM4 LYRM7 LYSMD3 MAD2L1 MAD2L1BP MAEA MAF MAFF MAGED4B MAGEE1 MagmasMAL MAML3 MAN1C1 MAP1LC3A MAP2K4 MAP3K7IP1 MAP3K8 MAPBPIP MAPK8IP3MAPKAPK2 MAPRE3 MARCKSL1 MARS2 MAST3 MAZ MBD2 MBD3 MBOAT2 MBP MBTPS1MCART1 MCHR2 MCM3APAS MCTP1 MCTP2 MCTS1 MDC1 MDH2 ME2 MED21 MED24 MED31MEF2A MEF2C MEF2D MEGF6 MEIS1 METAP1 METTL9 MFNG MGAT3 MGAT4A MGC10997MGC12760 MGC13005 MGC21881 MGC26356 MGC3020 MGC40489 MGC42367 MGC4677MGC52498 MGC87895 MID1IP1 MIER2 MIIP MIR1299 MIR142 MIR1974 MIR2116MIR574 MIR877 MIR98 MIS12 MLEC MLKL MLL5 MMGT1 MMP28 MNT MOAP1 MOBKL2CMORC2 MPDU1 MPHOSPH10 MPL MPP6 MRI1 MRP63 MRPL17 MRPL3 MRPL40 MRPL41MRPL43 MRPL44 MRPL45 MRPL47 MRPL55 MRPS10 MRPS15 MRPS25 MRPS27 MRPS34MRRF MS4A1 MS4A14 MS4A2 MS4A3 MS4A4A MS4A7 MSH2 MSL3 MSRB3 MSX2P1 MTCP1MTF2 MTHFD2 MTMR14 MTMR3 MTUS1 MTX1 MTX3 MUM1 MUT MVP MXI1 MYB MYCBP2MYH9 MYO9B MYOM1 MYST3 N4BP2 N4BP2L1 NAALADL1 NACAP1 NAGLU NAGPA NAIPNAP1L1 NAT6 NAT8B NBEA NBL1 NBN NBPF14 NCALD NCBP2 NCF1B NCF4 NCKAP1LNCOA5 NCOR2 NCR3 NCRNA00081 NCRNA00085 NCRNA00092 NCRNA00152 NDE1 NDFIP1NDRG3 NDUFA4 NDUFA5 NDUFAF3 NDUFB11 NDUFC1 NDUFS1 NEIL2 NELL2 NENFNFATC2IP NFIC NFIX NFKBIA NFKBIB NFKBIL2 NFX1 NFYB NHLRC4 NHP2 NIP7NIPSNAP3A NLRC5 NLRP1 NLRP12 NLRP7 NLRP8 NLRX1 NME2 NMI NMT2 NNT NOGNOL9 NOTCH2NL NOX4 NPAL3 NPAT NR1D2 NR3C1 NR3C2 NR4A2 NT5C NT5C3L NT5DC1NT5DC3 NTNG2 NUAK2 NUBPL NUCB2 NUCKS1 NUDCD2 NUDT16L1 NUDT2 NUDT21NUFIP2 NUMA1 OAF OAS1 OAS3 OASL ODC1 OGFOD1 OLA1 OMA1 OPN1SW OR1J1OR2A42 OR7E156P ORC5L OSBPL1A OSBPL8 OSTCL OTOF OTUD1 P2RX5 P2RY8 P4HBP704P P76 PA2G4 PABPC1 PABPC4 PACS1 PAFAH2 PAK1IP1 PAK2 PALLD PAN3 PAPD5PAPSS1 PAPSS2 PAQR4 PAQR8 PARM1 PARP10 PARP14 PARP15 PARP8 PARP9 PATE2PATL2 PCBP2 PCDHB9 PCDHGB6 PCYOX1 PDCD10 PDCD2 PDCL PDE12 PDE5A PDE7APDF PDIA3P PDK1 PDPK1 PDZD4 PEBP1 PECI PELI2 PELP1 PEMT PEX11B PEX14PFKFB3 PFKL PFN2 PFTK1 PGAM1 PGAM4 PGGT1B PGLS PGM2 PGM2L1 PHACTR2 PHAXPHB PHC2 PHC3 PHF11 PHF14 PHF2 PHF20L1 PHKB PHLDB3 PI4K2B PIAS2 PID1PIGT PIGX PIK3AP1 PIK3CD PIK3CG PIK3IP1 PIK3R1 PIM2 PIM3 PIN1 PIONPIP3-E PIP4K2A PIP5K1C PIP5K2A PITPNC1 PJA2 PKIA PKM2 PKN1 PKN2 PLA2G2DPLAA PLAG1 PLAGL1 PLAUR PLCB2 PLCG1 PLCXD1 PLD3 PLD6 PLEKHA1 PLEKHA5PLEKHB1 PLEKHB2 PLEKHF1 PLIN2 PLSCR1 PLXNA4 PML PMM2 PMS2L1 PMS2L2PMS2L5 PNKP PNPLA2 PNPLA6 PNPT1 PNRC2 POGK POLDIP3 POLG2 POLK POLR1DPOLR1E POLR2A POLR2E POLR2G POLR2J4 POLR2L POLR3GL POM121C POTE2 POTEEPPAPDC2 PPARBP PPFIA1 PPHLN1 PPIB PPID PPIG PPM1B PPM1K PPP1CB PPP1R13BPPP1R15B PPP1R2 PPP2R1A PPP2R2B PPP2R3A PPP2R5C PPP2R5D PPP2R5E PPP4R2PPPDE2 PPTC7 PRAGMIN PRDM4 PRIM2 PRKAG1 PRKCA PRKCB PRKCB1 PRKCH PRKCQPRKY PRMT2 PRPF39 PRPF8 PRR13 PRR7 PRRG4 PRRT3 PRUNE PSG3 PSG9 PSIP1PSMA3 PSMA6 PSMB7 PSMB8 PSMB9 PSMC4 PSMC6 PSRC1 PTBP1 PTDSS2 PTGR2 PTGS1PTK2B PTMS PTOV1 PTP4A2 PTPLAD1 PTPLAD2 PTPLB PTPN1 PTPN2 PTPN7 PTPRCPTPRE PTPRO PUM1 PURA PVALB PYCARD QARS QRICH1 RAB11FIP1 RAB11FIP4RAB11FIP5 RAB20 RAB22A RAB24 RAB33A RAB33B RAB37 RAB3IP RAB43 RABGEF1RAD21 RAD23A RAD23B RAD51 RAG1AP1 RALA RALGPS2 RALY RANBP9 RANGRF RAP1BRAPGEF2 RAPGEF6 RASA2 RASD1 RASSF2 RASSF5 RASSF6 RAVER1 RAX2 RAXL1 RBBP4RBBP5 RBM11 RBM12B RBM17 RBM3 RBM39 RBM4 RBMS1 RC3H2 RCN2 REC8 RFK RFNGRFX1 RFX4 RGL4 RGMA RGPD1 RGS18 RHBDD2 RHBDF2 RHOF RHOQ RHOT1 RIOK1RIPK3 RLN2 RN5S9 RN7SK RNASE3 RNASEH2B RNASEN RNF10 RNF103 RNF135 RNF144RNF144A RNF213 RNF26 RNMT RNPEPL1 RNU12 RNU4ATAC RNU5A RNU6-1 RNY3 ROCK1ROCK2 ROD1 RPAP2 RPAP3 RPL10A RPL17 RPL22 RPL23A RPL23AP13 RPL26L1RPL29P2 RPL36 RPL37 RPL4 RPL5 RPL6 RPL7L1 RPL8 RPP40 RPRD2 RPS10P3 RPS14RPS15 RPS18 RPS29 RPS3 RPS4X RPS5 RPS6 RPS6KA1 RPS6KA2 RPS6KA4 RPS6P1RPS7 RPS8 RPUSD1 RRBP1 RRP1B RSAD1 RSF1 RSL1D1 RTBDN RTKN2 RTP4 RUFY2RUNX1 RUNX3 RWDD1 RXRA RYBP S100A10 S100A6 SAC3D1 SAMD8 SAMD9L SAMSN1SAP30L SBF1 SBK1 SCAMP1 SCARNA16 SCARNA21 SCARNA22 SCARNA5 SCN3A SDAD1SDHAF1 SDHC SDPR SEC13 SEC16A SEC23A SEC24A SEC62 SELL SELM SELPLG SELSSELT SEMA3E SEMA4D SEMA4F SENP6 SEPN1 SEPW1 SERINC1 SERINC3 SERPINA1SERPINB8 SERPINE2 SERPING1 SERTAD2 SESN1 SET SETD1A SETD1B SETD6 SF1SF3A1 SF3A2 SF3B14 SF4 SFRS11 SFRS12 SFRS12IP1 SFRS2B SFRS3 SGK SGK3SGOL2 SH2B2 SH3BGRL3 SH3BP2 SH3GL1 SH3GLB2 SH3KBP1 SH3PXD2A SIAH1 SIDT2SIGLEC7 SIGLECP3 SIK3 SIL1 SIN3A SKA2 SKAP1 SKP2 SLA2 SLAMF8 SLC15A2SLC24A3 SLC25A19 SLC25A23 SLC25A28 SLC25A3 SLC2A14 SLC2A6 SLC35C1SLC35E1 SLC36A4 SLC38A1 SLC39A11 SLC39A8 SLC44A2 SLC45A3 SLC4A5 SLC5A8SLC6A10P SLC7A1 SLC7A3 SLC7A6 SLC8A3 SLC9A4 SMA5 SMAD3 SMAD5 SMARCA5SMARCB1 SMARCC1 SMARCC2 SMC5 SMPD1 SMYD2 SMYD3 SNAPC1 SNHG10 SNHG8 SNHG9SNORA12 SNORA28 SNORD13 SNORD16 SNORD18C SNORD21 SNORD46 SNORD58BSNORD62B SNORD71 SNORD73A SNORD76 SNORD95 SNRPD3 SNRPE SNUPN SNURF SNX14SNX17 SNX20 SNX7 SOCS3 SOCS4 SORBS3 SP1 SP100 SP2 SP4 SPC24 SPC25 SPCS1SPCS2 SPG21 SPI1 SPIN1 SPNS3 SPOCK2 SPTAN1 SPTLC1 SREBF1 SRFBP1 SRMSRP19 SRP72 SRPK2 SRRM2 SS18 SSB SSBP3 SSH1 SSNA1 SSR4 ST6GAL1 ST6GALNAC4 ST6GALN AC6 STAR STARD7 STAT1 STAT4 STK40 STRN4 STX10 STX7 SULT1A2SULT1A3 SUMF2 SUMO1 SUMO1P3 SURF6 SUV420H1 SVIL SYAP1 SYF2 SYNC1 SYNE1SYTL2 SYTL3 TACC1 TADA1L TAF1C TAF1D TAF4 TAF8 TAF9 TAGAP TAGLN TAL1TANK TAP1 TARP TATDN2 TBC1D10B TBC1D22A TBC1D7 TBC1D9B TBCA TBL1X TCEA2TCEA3 TCEAL4 TCEAL8 TCEB1 TCEB2 TCERG1 TCFL5 TCL1A TCL1B TCP1 TDG TDRD7TECR TESK1 TFEC TFIP11 TGFBR2 TGIF1 THEX1 THOC2 THOC4 TIAF1 TIAL1 TIFATIMELESS TIMM10 TIMM22 TIMP2 TLE2 TLK1 TLN1 TLR10 TLR5 TMC6 TMCC1 TMCC3TMEM106A TMEM107 TMEM109 TMEM111 TMEM116 TMEM126B TMEM137 TMEM156TMEM165 TMEM185A TMEM189-UBE2V1 TMEM191A TMEM203 TMEM204 TMEM209 TMEM219TMEM38B TMEM50B TMEM51 TMF1 TMSB4X TMTC4 TMUB1 TMX4 TNFAIP6 TNFAIP8L1TNFRSF21 TNFRSF25 TNFRSF9 TNFSF10 TNFSF12 TNFSF13 TNFSF13B TNFSF14TNFSF15 TNIK TNS1 TOB1 TOMM20 TOMM7 TOP1MT TOP1P1 TOP1P2 TOP2B TOX TOX2TP53BP2 TP53INP2 TPI1 TPM4 TPP2 TPRKB TRA1P2 TRAPPC4 TRAPPC9 TRAT1TRIM13 TRIM16L TRIM22 TRIM23 TRIM26 TRIM4 TRIM5 TRIM52 TRIM78P TRIM9TRIOBP TROVE2 TRPC4AP TRRAP TSC22D1 TSC22D3 TSEN15 TSEN54 TSGA14 TSHZ1TSPAN14 TSPAN5 TSTD1 TTC3 TTC4 TTN TTRAP TUBA1A TUBA3E TUBB4Q TUFM TULP4TUT1 TWSG1 TYMP TYSND1 U2AF1 UBA3 UBAP2L UBE1C UBE2D1 UBE2D2 UBE2E3UBE2H UBE2J1 UBE2L6 UBE2O UBE2V1 UBE2W UBE3B UBE4B UBN2 UBXN7 UCRCUGCGL1 UGP2 UHMK1 UHRF2 UIMC1 UNC84B UNC93B1 UNKL UPF3A UQCRH URG4 USH1GUSP10 USP13 USP14 USP18 USP33 USP47 USP48 USP5 USP53 USP6 USP9X UXTVAC14 VAMP2 VAV3 VDAC2 VEGFB VEZT VHL VPS13B VPS13C VPS28 VPS41 VPS52VSIG1 VWF WAS WASH2P WBP11 WBP2 WDFY3 WDR1 WDR23 WDR48 WDR73 WDR74 WDR75WDR82 WHAMM WNK1 WRB WRNIP1 WWP1 WWP2 XAB2 XAF1 XRCC4 XRCC6 XRN1 XRN2XYLT2 YES1 YIF1A YIPF4 YOD1 YPEL3 YTHDC1 YY1 ZBED4 ZBTB16 ZBTB3 ZBTB4ZBTB42 ZBTB43 ZBTB9 ZC3H4 ZC3H5 ZCCHC10 ZCCHC14 ZDHHC4 ZDHHC9 ZFAND1ZFHX3 ZFP14 ZFP30 ZFP37 ZFP91 ZFPM1 ZFYVE19 ZFYVE27 ZMYND11 ZNF12 ZNF121ZNF131 ZNF136 ZNF142 ZNF148 ZNF185 ZNF204 ZNF223 ZNF234 ZNF24 ZNF252ZNF256 ZNF260 ZNF274 ZNF281 ZNF282 ZNF319 ZNF32 ZNF320 ZNF329 ZNF337ZNF33A ZNF345 ZNF364 ZNF37A ZNF395 ZNF420 ZNF430 ZNF438 ZNF441 ZNF444ZNF471 ZNF502 ZNF518B ZNF524 ZNF526 ZNF529 ZNF540 ZNF544 ZNF559 ZNF562ZNF567 ZNF580 ZNF589 ZNF609 ZNF615 ZNF626 ZNF638 ZNF641 ZNF644 ZNF669ZNF683 ZNF716 ZNF738 ZNF773 ZNF792 ZNF805 ZNF818 ZNF828 ZNF831 ZNF860ZNF91 ZNF92 ZNF93 ZRSR2 ZSCAN2 ZYG11B CREB1 CLOCK ZNF398 ATXN7L3BMTRNR2L1 ZBED3 PPM1A ZNF160 RORA FBXO22 TRDV3 CCNG2 DDI2 TTC39C ETS1ZMAT3 LRRC8B ZNF33B TMEM33 GDF11 TNRC6C RAB27B

TABLE 17 Gene Listing of Commonly Dysregulated Genes in Discovery andReplication Toddlers ABCG1 ACACB AGER AGPAT3 AKR1C3 AKR1D1 AKT1 ALG10BANKRD22 ANKRD44 ANXA1 ANXA3 AP2A1 ARAP3 ARHGAP10 ARHGAP25 ARHGAP30ARHGAP9 ARL5A ASCC3 ASMTL ATG2A ATG4C ATP1B1 ATP5A1 AXIN1 BIRC3 BMF BPGMBRDG1 C10orf4 C11orf82 C14orf102 C16orf53 C19orf59 C1GALT1 C1GALT1C1C1QBP C20orf30 C3orf17 C3orf38 C3orf58 C4orf16 C4orf32 C4orf34 C6orf150C7orf28A C9orf127 C9orf72 C9orf85 CABC1 CABIN1 CAMK1D CAPZA1 CARD17 CBFBCBX7 CCDC117 CCDC50 CCDC90B CCDC91 CCNY CCNYL1 CCS CCT6P1 CD274 CD300LBCD3E CD84 CD97 CDAN1 CDC2L6 CDK2AP2 CDK5RAP3 CENPV CEPT1 CERK CHES1 CHMCHMP5 CHORDC1 CHPF2 CKS2 CLEC4D CLIC4 CNIH4 COMMD8 COPS2 CPNE1 CRY2CSNK1E CTDP1 CTDSP1 CTSF CXorf57 CYP20A1 DAPK2 DBI DBP DCK DCTN6 DDIT4DDX60 DHPS DHRS3 DHX34 DLEU1 DNAJB14 DNHD2 DPEP2 DPM3 DRD4 DTWD1 DUSP22DYNLT3 ECT2 EEF2K EIF3G ENO2 ENO3 EPN2 EPSTI1 ETNK1 FABP5 FAM134AFAM134B FAM153B FAM91A2 FANCL FBXO5 FEZ1 FHL3 FICD FKTN FLJ39639 FOXJ2FYN FYTTD1 GALT GATAD2B GATS GBP1 GCH1 GNAI2 GNPDA2 GOLPH3L GPR141 GPR68GPR84 GRASP GSTM1 GSTM2 GTF3C6 GTPBP8 HCCS HERC2 HHEX HIBCH HINT3 HK1HNRPK HPCAL4 HRSP12 HSPA9 IFI16 IFI27 IGF2BP3 IL6ST IMPA2 INADL IP6K1IQCB1 ITFG1 ITGAX ITPKB KCNG1 KDM6B KHNYN KIAA0247 KIAA1279 KIAA1715KIF2A KLHL20 KPNA2 KPNA6 LACTB LDHA LFNG LGALS3BP LGALS8 LMF2 LMTK3LOC202134 LOC387934 LOC389816 LOC442582 LOC643272 LOC648733 LOC650898LOC652837 LOC653105 LOC654121 LOC729843 LPIN2 LRRC26 LYPLA1 MAD2L1MAD2L1BP MAP1LC3A MAPRE3 MAST3 ME2 METAP1 MGAT3 MGC12760 MGC13005MGC3020 MGC40489 MID1IP1 MLKL MRPL3 MRPL47 MRPS10 MS4A1 MS4A2 MS4A4AMSH2 MTHFD2 MUT MYH9 MYO9B MYOM1 Magmas N4BP2L1 NAALADL1 NAGLU NAT6 NBNNCBP2 NCOR2 NCR3 NDE1 NDRG3 NFATC2IP NFIC NFKBIB NLRP1 NNT NR3C2 NUCB2NUDT16L1 OMA1 OTOF PACS1 PAFAH2 PARP9 PCYOX1 PDCD10 PDZD4 PFTK1 PGGT1BPHAX PHC3 PHF14 PHF2 PHKB PI4K2B PIAS2 PIGX PIK3CD PITPNC1 PKIA PLCB2PLD3 PLEKHF1 PLSCR1 PML PNPLA2 PNRC2 POLR1E PPM1K PPPDE2 PSMA3 PSMA6PSMC6 PTDSS2 PTMS PTP4A2 PTPLAD1 PTPN2 PTPRE PTPRO RAB37 RAD23B RAD51RALY RASSF2 RBM3 RFNG RFX4 RGPD1 RHBDD2 RHOT1 RIOK1 RN7SK RPAP3 RPL6RPP40 RPS6KA2 RPS7 RTP4 SAMD9L SAMSN1 SDHAF1 SELL SELM SEMA4D SERPINB8SF1 SFRS12IP1 SFRS3 SGOL2 SH3GL1 SIGLEC7 SIGLECP3 SLC35E1 SLC39A8SLC44A2 SLC45A3 SMARCA5 SMARCC2 SNX14 SOCS4 SORBS3 SP100 SPC25 SPNS3SPTLC1 SREBF1 SRFBP1 SRP72 SS18 SSB SSBP3 STAT1 STRN4 SUMO1P3 SYTL3TADA1L TANK TBC1D9B TBCA TBL1X TCEB2 TDG THEX1 THOC2 TIFA TLR10 TMEM126BTMEM165 TMTC4 TNFRSF21 TNFSF12 TNFSF14 TP53INP2 TPRKB TRIM22 TRIM78PTRPC4AP TSC22D1 TSC22D3 TSEN54 TSGA14 TSPAN14 UBA3 UBE4B UGP2 UNKL VAMP2VEZT VPS13B VPS28 VPS41 WDR73 WNK1 WRB XYLT2 YES1 YPEL3 YY1 ZBTB16 ZBTB4ZFPM1 ZFYVE27 ZNF24 ZNF345 ZNF395 ZNF430 ZNF518B ZNF526 ZNF567 ZNF589ZNF626 ZNF92

TABLE 18 Gene Listing of DNA-Damage Genes 14-3-3 ATM Bax Bcl-2 ICAD CBPCDK1 (p34) CREB1 DNA ligase IV FasR(CD95) G-protein alpha-s HSP90 I-kBPHAP1 (pp32) MRE11 PCNA AKT(PKB) PLC-beta PP2A catalytic RPA3 RAD23ARad51 Rb protein p90Rsk STAT1 SOS PDK(PDPK1) XRCC4 Adenylate cyclaseBeta- catenin c-Abl Calmodulin Caspase-7 Caspase-8 Cyclin D NibrinERK1/2 ATR Ubiquitin PI3K cat class IA PI3K reg class IA MEK4(MAP2K4)C-IAP2 c-IAP1 HSP27 PKC-alpha PKA-cat (cAMP- dependent)p300 Histone H1Caspase-2 POLR2A Cyclin A HSP70 SUMO-1 Lamin B MKK7 (MAP2K7) PML NCOA1(SRC1) SP1 MSH2 TDG GLK(MAP4K3) PLK3 (CNK) FHL2 Ku70 SET WRN PP2C BimBMF MAP1 RAP-1A Caspase-4 EGR1 CDC25B NURR1 POLD cat (p125) Chk1 Keratin1 NAIP Beta- arrestin2 14-3-3 theta Artemis BFL1 Centrin-2 Chk2 ERCC-1ERCC8 FANCL HMG2 Histone H2B La protein Lamin B1 MSH3 MUNC13-4 MutSbetacomplex N- myristoyltransferase NFBD1 NUMA1 PIAS2 PNKP POLD reg (p12)PTOP RAD23B RBBP8 (CtIP) RPL22 Rab-27A Sirtuin USP1 VDAC2 XAB2 cPKC(conventional) hnRNP A1 hnRNP C p23 co-chaperone

TABLE 19 Gene Listing of Mitogenic Signaling Genes Bax Bcl-2 ERK5(MAPK7) C3G CBP CDK1 (p34) CREB1 CRK c-Cbl CDC42 ErbB2 FasR(CD95) FynG-protein alpha-i family G-protein alpha-s RASA2 HSP90 I-kB JAK2 LIMK2Lck NF-AT4(NFATC3) PAK2 PCNA AKT(PKB) PKC-zeta PKR PLC-beta PLC-gammaPim-1 Pyk2(FAK2) Rb protein p90Rsk STAT1 SOS Tyk2 PDK(PDPK1) VEGF-BAdenylate cyclase Beta-catenin Calmodulin Caspase-7 Cyclin D gp130ERK1/2 SKP2 Paxillin PKC Ubiquitin PI3K cat class IA PI3K reg class IARPS6 MEK4(MAP2K4) C-IAP2 c-IAP1 MAPKAPK2 HSP27 PKC-beta PKC-alpha ILKPKA-cat (cAMP-dependent) FOX03A RalA p300 MRLC COX-1 (PTGS1) GMF DCORCyclin A2 PKC-theta IRS-2 SH2B MKK7 (MAP2K7) NCK1 N-Ras NCOA1 (SRC1) SP1IBP DOK2 TPL2(MAP3K8) GLK(MAP4K3) RASA3 Sequestosome 1(p62) ICAM1 BaxBcl-2 ERK5 (MAPK7) C3G CBP CDK1 (p34) CREB1 CRK c-Cbl CDC42 ErbB2FasR(CD95) Fyn G-protein alpha-i family G-protein alpha-s RASA2 HSP90I-kB JAK2 LIMK2 Lck NF-AT4(NFATC3) PAK2 PCNA AKT(PKB) PKC-zeta PKRPLC-beta PLC-gamma Pim-1 Pyk2(FAK2) Rb protein p90Rsk STAT1 SOS Tyk2PDK(PDPK1) VEGF-B Adenylate cyclase Beta-catenin Calmodulin Caspase-7Cyclin D gp130 ERK1/2 SKP2 Paxillin PKC Ubiquitin PI3K cat class IA PI3Kreg class IA RPS6 MEK4(MAP2K4) C-IAP2 c-IAP1 MAPKAPK2 HSP27 PKC-betaPKC-alphaILK PKA-cat (cAMP-dependent) FOXO3A RalA p300 MRLC COX-1(PTGS1) GMF DCOR Cyclin A2 PKC-theta IRS-2 SH2B MKK7 (MAP2K7) NCK1 N-RasNCOA1 (SRC1) SP1 IBP DOK2 TPL2(MAP3K8) GLK(MAP4K3) RASA3 Sequestosome1(p62) ICAM1 BCR PLAUR (uPAR) RAP-1A PDZ-GEF1 MAGI-1(BAIAP1) TuberinEGR1 NFKBIA CDC25B SOCS3 MEF2C PLGF ERK1 (MAPK3) Angiopoietin 1PLC-gamma 1 p90RSK1 LPP3 PI3K reg class IA (p85-alpha) Neutralsphingomyelinase DIA1 14-3-3 zeta/delta Acid sphingomyelinase BFL1 BUB1CCL2 CERK1 GIPC GLCM MLCP (cat) NCOA3 (pCIP/SRC3)PAQR7 PAQR8 PDGF-DPEDF-R (iPLA2-zeta) PELP1 PI3K cat class IA (p110-delta) PI3K reg classIA (p85) PKA-cat alpha RGL2 RNTRE ROCK1 ROCK2 SPT1 TSAD Tcf(Lef) Tob1WNK1

TABLE 20 Top 30 Genes with the Highest Gene Connectivity Correlated withBrain Size Variation in ASD Module greenyellow DLGAP5 HMMR CEP55 CDKN3CCNB2 ASPM KIF11 KIAA0101 OIP5 TOP2A BUB1 NUSAP1 TYMS NCAPG CDC45L CCNA2MCM10 CHEK1 UBE2C AURKA CDC2 CENPE PTTG3P PRC1 CDCA5 MELK UHRF1 MND1ZWINT GMNN Module grey60 TXNDC5 TNFRSF17 ABCB9 MGC29506 CD38 FKBP11SEC11C LOC647450 LOC647506 LOC652493 LOC652694 CRKRS IGJ CAMK1G GGH CAV1GLDC DNAJB11 ELL2 FAM46C IGLL1 ARMET LOC642113 ITM2C HSP90B1 LOC642131SLC25A4 LOC651751 LOC390712 SDF2L1 Module midnightblue SH3BGRL2 CTDSPLGP9 PDE5A TUBB1 ITGB5 ESAM SEPT5 TREML1 PTGS1 TSPAN9 CTTN NRGN PTCRASELP ITGA2B MARCH2 MYLK SDPR ALOX12 PEAR1 ACRBP ABLIM3 F13A1 CMTM5 GNG11DDEF2 C7orf41 ASAP2 ANKRD9 Module yellow SDCBP LRRK2 RP2 FAM49B MNDAUBE2W LOC100129960 NDUFS3 DDX3X PLXNC1 MCL1 JMJD1C CENTB2 ST8SIA4 SNX13SNX10 ELOVL5 C12orf35 SPAG9 MRPS12 CYB5R4 LOC729279 LYST POMGNT1 SPOPLPELI1 OGFRL1 SHOC2 CDC42EP3 ACSL4 Module cyan LOC440313 SPRYD3 LOC642469DPYSL5 GPR175 EPB42 SERPINA13 LOC100131726 MUC6 HBD SLC25A39 AHSPSELENBP1 LOC100132499 RNF213 ROPN1B LOC100131391 LOC100131164 STRADBIFIT1L FBXO7 UBXN6 EPB49 HBQ1 ALAS2 SEMA6B TESC HBE1 GUK1 LOC652140Module turquoise ITPRIP NUMB REPS2 AQP9 SEPX1 STX3 FCGR2A RNF149 BASP1NCF4 RBM47 NFIL3 MXD1 PHC2 LIMK2 TLR1 GK BCL6 CSF3R GCA LOC730278SLC22A4 NDEL1 CEACAM3 RALB PFKFB4 LOC654133 PSG3 MANSC1 CXCR1

TABLE 21 Top 30 Genes with the Highest Gene Connectivity Correlated withBrain Size Variation in Control Module greenyellow NCAPG HMMR DLGAP5CCNB2 CDC20 TOP2A C12orf48 CDKN3 CDC45L CEP55 NUSAP1 BUB1 KIF11 CHEK1ASPM TYMS CDC2 NEK2 DEPDC1B PTTG3P PTTG1 KIAA0101 AURKA OIP5 MND1 MELKCCNA2 GMNN CDCA5 CCNE2 Module grey60 TNFRSF17 MGC29506 TXNDC5 LOC647450LOC652493 ABCB9 LOC652694 LOC642113 IGJ LOC647506 CD38 GLDC SEC11C IGLL1CAMK1G CRKRS FKBP11 ARMET CAV1 FAM46C GGH IGLL3 ITM2C LOC390712LOC729768 HSP90B1 PRDX4 ELL2 GMPPB DNAJB11 Module midnightblue ITGB5PDE5A ITGB3 TSPAN9 GP9 TUBB1 PPBP CTDSPL CTTN SDPR PTGS1 NRGN NCKAP5SEPT5 PTCRA SH3BGRL2 ACRBP ITGA2B ALOX12 TREML1 C5orf4 ESAM ELOVL7 F13A1GNG11 PROS1 DDEF2 GP1BA ANKRD9 ASAP2 Module yellow SDCBP LRRK2 ZFYVE16NDUFS3 CPSF4 FAM49B DCTPP1 DNAJC8 KRTCAP2 TMEM154 WDR54 MEGF9 LOC391811LOC100129960 CMTM6 PELI1 NDUFS8 NUDT1 PLXNC1 SLC12A6 PAFAH1B3 ADSL SPAG9NHP2 ITPA NDUFB8 SLC40A1 CPEB2 MRPS12 APAF1 Module cyan LOC642469 GPR175LOC100131726 LOC440313 SPRYD3 MUC6 AHSP HBD SLC25A39 LOC100132499 STRADBEPB42 LOC389599 DPYSL5 SERPINA13 FBXO7 EPB49 UBXN6 LOC100131164LOC100131391 RNF213 MIR98 SELENBP1 MRPL40 LOC645944 C1orf77 LOC728453PMM1 HBE1 LOC100130255 Module turquoise GCA NUMB PFKFB4 REPS2 TLR6 SRGNRNF149 TLR1 ACSL1 CSF3R ITPRIP LIMK2 FCGR2A SEPX1 PHC2 LILRB3 STX3 GKFRAT2 FPR1 NFIL3 PSG9 LIN7A S100A11 TNFRSF1A RALB AQP9 NCF4 FTHL12 LAMP2

TABLE 22 Top 30 Genes with the Highest Gene Significance Correlated withBrain Size Variation in ASD Module greenyellow RNASEH2A C6orf129 EBPSTOML2 RRM1 RAPGEF5 STMN1 CENPM CCNF TOP2A PSMB7 KIF20A FAM19A2 PDCD1BIRC5 LOC441455 CDCA5 PHF19 FEN1 MCM2 CCNB2 MND1 RACGAP1 PTTG3P MTHFD1LFABP5L2 CHST12 UBE2T PLS3 CENPA Module grey60 PDIA4 RPN2 MOXD1 MTDHIGLL3 CRKRS HYOU1 LOC647506 BCL2L11 KLHL14 SDF2L1 IGLL1 ABCB9 EAF2DENND5B IRF4 ARMET TNFRSF17 ITM2C PDIA5 LOC652694 DNAJB11 SPATS2LOC647460 SEC11C GLDC POU2AF1 LOC541471 C14orf145 MGC29506 Modulemidnightblue RNF11 PDGFC MPP1 CDC14B TUBB1 TPM1 ZNF185 P2RY12 MMD SDPRNCKAP5 SPOCD1 FHL1 MARCH2 ARHGAP18 ASAP2 VCL FRMD3 CALD1 GNG11 GUCY1B3LY6G6F F13A1 LEPR JAM3 MYLK BMP6 ELOVL7 PGRMC1 SPARC Module yellow BLMHDDT SSFA2 PHPT1 TLR8 HDAC1 OSGIN2 FAM159A MAPK14 NDUFB9 LAGE3 DMXL2PDCD2L SLC2A1 NTHL1 STRA13 NPM3 HIST1H2AC C6orf108 LCP2 CLPP NDUFA7MRPL55 MCTP1 WBSCR22 MFSD1 LMAN2 CDK10 FAM105A DUSP6 Module cyan EEF1DLOC728453 ZNF33B PTDSS1 PMM1 TULP4 ARL1 CSDA WDR40A LOC731985 TRIM58SSNA1 SF4 RPS29 ADIPOR1 SNCA ERCC5 GALT LOC100132499 LOC653635 LOC440359ANKRD54 LOC130773 PDZK1IP1 LOC441775 MRPL40 LOC100130255 WDR70 MARCH8VIL2 Module turquoise LOC346887 C9orf72 LAX1 IGFBP4 C3orf26 NOTCH2 RGS18NCOA4 TRIB2 MAX BID LOC641710 CDS2 MRPS9 B4GALT5 FAM193B DSE LOC388707SLAMF6 IRAK3 MEF2A PARP1 SNN ARPC5 AUTS2 SNX6 FAM98A C9orf66 HEY1 ALOX5

TABLE 23 Top 30 Genes with the Highest Gene Significance Correlated withBrain Size Variation in Control Module greenyellow CDC2 KIF11 NUSAP1MELK PRC1 DTL DEPDC1B TTK OIP5 CCNA2 UHRF1 TYMS KIF20A KPNA2 MCM10 UBE2CTK1 CENPE NUF2 ASPM KIAA0101 DLGAP5 CDC20 CCNE2 DONSON EZH2 GMNNMGC40489 NEK2 NCAPG Module grey60 IGLL3 CRKRS CAMK1G PERP HSPA13 SPATS2IGLL1 SLC25A4 GGH CD38 ELL2 UAP1 MGC29506 BIK LOC401845 PRDX4 TNFRSF17XBP1 SEC61B GLDC LOC649210 LOC652694 LOC652493 FKBP11 IGJ CAV1 TXNDC5LOC649923 LOC647506 LOC652102 Module midnightblue SMOX ARHGAP18 SPARCHIST1H2AG C15orf26 PLOD2 C16orf68 ARHGAP21 TREML1 XPNPEP1 ANKRD9 TAL1C5orf62 C11orf59 KIFC3 LOC650261 LOC441481 ESAM TSPAN9 GP9 GNG11 GRB14CMTM5 ITGA2B CLDN5 CALD1 PF4V1 LY6G6F TUBA4A GPX1 Module yellow ZNF426ELMOD2 ILKAP LOC644739 PRDM1 PDPK1 LOC653344 TGFBR2 UPF2 ZNF480 DMAP1CCDC28B VARS FAM44A NTHL1 KLHDC4 MYO9A OTUD1 C10orf118 IPMK TCP11L2 PHF3BTBD2 PHF20L1 PCSK7 STRA13 PDE4B KIF22 RTN4 TMEM106C Module cyan SNORD8ZNF33A AKAP7 C20orf108 BLVRB UBE2F DERL2 PPIG EWSR1 SF4 HPS1 C17orf68HEMGN DSCAM TESC LOC100134108 NDUFAF1 LOC100134102 LOC100130769 HECTD3GSPT1 MAPK13 KRT1 SRRD SNF8 PPP2R2A IGF2BP2 LOC652968 RN5S9 PDZK1IP1Module turquoise PPARBP PPOX ZNF551 ZNF135 ACOT4 MSTO1 CEP290 MPZL1CPPED1 KIAA1641 METT11D1 NUP43 BTBD6 OPTN METTL2A USP36 TMEM45B TOP3BXYLT2 ZNF805 ALG9 TBK1 IRAK1BP1 DIS3L EFHC2 TMEM217 MGC42367 LRRC25IL8RB DCAF7

TABLE 24 Top 30 Genes with the Highest Module Membership Correlated withBrain Size Variation in ASD Module greenyellow DLGAP5 CDKN3 HMMR OIP5KIAA0101 CEP55 NUSAP1 KIF11 BUB1 TOP2A ASPM CCNA2 CCNB2 TYMS CHEK1 NCAPGPTTG3P CDC45L AURKA MELK MCM10 CDC2 CENPE GMNN UBE2C PRC1 PTTG1 CDCA5MND1 TTK Module grey60 TXNDC5 ABCB9 TNFRSF17 MGC29506 FKBP11 CD38 CRKRSSEC11C LOC647506 CAMK1G LOC647450 LOC652694 CAV1 LOC652493 GGH DNAJB11FAM46C ITM2C ELL2 GLDC IGLL1 IGJ ARMET LOC390712 LOC642131 HSP90B1SLC25A4 LOC642113 IGLL3 LOC651751 Module midnightblue SH3BGRL2 GP9CSTDPL PDE5A TUBB1 ESAM ITGB5 SEPT5 TREML1 PTGS1 CTTN PTCRA MYLK NRGNMARCH2 SELP ALOX12 TSPAN9 SDPR ACRBP ABLIM3 PEAR1 DDEF2 F13A1 ITGA2BGNG11 ASAP2 CMTM5 DNM3 C7orf41 Module yellow NDUFS3 POMGNT1 LOC729279CPSF4 DGCR6 MRPS12 AIP POLR3C PAFAH1B3 KRTCAP2 MRPL37 ADSL L3MBTL2 BMS1NUDT1 IMP4 RPUSD2 VEGFB LAGE3 WDR54 C19orf53 LAT C11orf2 EIF3B B4GALT3APRT DHPS TRAPPC6A NDUFS8 C17orf70 Module cyan LOC642469 SPRYD3LOC440313 SERPINA13 HBD EPB42 LOC100131726 DPYSL5 AHSP SLC25A39 GPR175MUC6 SELENBP1 ROPN1B LOC100131164 IFIT1L LOC100131391 STRADB RNF213FBXO7 HBQ1 UBXN6 EPB49 ALAS2 TESC SESN3 SEMA6B WDR40A HBE1 TMEM111Module turquoise ITPRIP REPS2 SEPX1 STX3 AQP9 FCGR2A NFIL3 NUMBLOC730278 PSG3 BASP1 TLR1 RNF149 NCF4 LOC100134728 RALB PHC2 LIMK2 TLR8GK PSG9 SLC22A4 CCPG1 CEACAM3 FTHL12 FAM49A KCNJ2 GCA FPR1 LOC729009

TABLE 25 Top 30 Genes with the Highest Module Membership Correlated withBrain Size Variation in Control Module greenyellow C12orf48 HMMR NCAPGCDKN3 DLGAP5 CCNB2 CDC20 CDC45L TOP2A CHEK1 PTTG3P NUSAP1 CEP55 PTTG1MND1 CDC2 BUB1 DEPDC1B NEK2 KIAA0101 KIF11 AURKA GMNN OIP5 TYMS ASPMCCNE2 NUF2 CCNA2 CDCA5 Module grey60 MGC29506 TNFRSF17 TXNDC5 ABCB9LOC647450 LOC652694 LOC652493 LOC642113 GLDC LOC647506 CD38 IGJ SEC11CIGLL1 CRKRS FKBP11 CAV1 BUB1 ARMET CAMK1G FAM46C GGH ITM2C LOC390712IGLL3 DNAJB11 SPATS2 HSP90B1 XBP1 ELL2 Module midnightblue ITGB5 GP9PDE5A TSPAN9 SDPR TUBB1 CTTN ITGB3 PTCRA NRGN PPBP PTGS1 SEPT5 NCKAP5CTDSPL ESAM ALOX12 SH3BGRL2 TREML1 F13A1 ACRBP C5orf4 GP1BA ELOVL7ITGA2B GNG11 DDEF2 PROS1 TNFSF4 ANKRD9 Module yellow CPSF4 NDUFS3 DNAJC8LOC391811 ITPA PAFAH1B3 KRTCAP2 ADSL NDUFS8 WDR54 DCTPP1 SAE1 NDUFB8NUDT1 SCAMP3 CUTA C19orf48 CCT7 NHP2L1 NHP2 PDXP PTPRCAP LSM2 MRPS12ATIC TTC4 CCT3 NXT1 IMP3 DPH2 Module cyan LOC642469 GPR175 AHSPLOC100131726 LOC440313 SPRYD3 MUC6 HBD SLC25A39 EPB49 EPB42 STRADBLOC389599 FBXO7 UBXN6 DPYSL5 LOC100131164 SERPINA13 SELENBP1LOC100131391 RNF213 HBE1 TRIM58 MYL4 SNCA SEMA6B CSDA LOC440359 ROPN1BHBQ1 Module turquoise GCA PFKFB4 SRGN TLR6 NUMB SEPX1 TLR1 FTHL12 ACSL1LIMK2 MNDA S100A11 NFIL3 ITPRIP RALB LIN7A TLR8 STX3 LILRB3 PSG9 FCGR2AGK LOC730278 FTHL7 PHC2 REPS2 PGCP FPR1 RNF149 LOC729009

Discussion

In this naturalistic study of autism brain size and gene expressionconducted during very early development, evidence of specific earlyfunctional genomic pathology related to brain development and size invivo in ASD toddlers was identified. Results show abnormal braindevelopment and size in ASD toddlers involves disruption of cell cycleand protein folding networks plus induction of abnormal functioning ofcell adhesion, translation and immune gene networks. Also, dysregulationof DNA-damage, cell cycle regulation, apoptosis, mitogenic signaling,cell differentiation and immune system response gene networks wasreplicated in both ASD study groups. It was previously reported severalof these gene networks are disrupted in prefrontal cortex in postmortemASD children². Thus, postmortem and the present in vivo evidence raisethe theory that very early, probably prenatal, disruption of several keydevelopmental gene networks leads to known defects of abnormal neuronnumber⁶, brain^(6-9,11,12) and body²⁷ growth, and synaptic developmentand function²⁸, as previously reported.^(7,11,29-31)

In the brain in animal model studies,^(32,33) cell cycle and proteinfolding networks impact cerebral cortical neuron production and synapsedevelopment, respectively, and, therefore brain and cortical size andfunction. Using a novel approach that combines MRI and gene expression,it was discovered that gene expression signals of both networks aredetectable in the blood in control toddlers and, remarkably, arestrongly correlated with brain and cerebral size, including corticalsurface area. Variations in brain size in ASD toddlers are only weaklycorrelated with cell production and protein folding expression levels,and instead are more strongly related to a variety of other functions,namely cell adhesion, immune/inflammation, translation and otherdevelopmental processes. Thus, even given similar brain sizes orcortical surface areas in ASD versus control toddlers, the geneticfoundations for brain development and growth are apparently distinctlydifferent. Dysfunction of cell cycle processes has long been theorizedto underlie brain growth pathology in ASD⁷. The present evidence alongwith recent evidence of a 67% overabundance of prefrontal corticalneurons in ASD boys⁶ underscores the relevance of this theory toelucidating the molecular and cellular developmental neuropathology andorigins of ASD.

Dysregulation of cell adhesion networks, as well as protein folding inASD toddlers, likely point to underlying abnormalities of synapsedevelopment and function, as well as to global alterations oftranscriptional regulation.^(34,35) Accumulation of misfolded proteinsleads to the Unfolded Protein Response (UPR)³⁶. Converging evidenceshows that misfolded proteins and UPR may underlie impaired synapticfunction in autism³⁷, as well as in neurodegenerative disorders³⁸.Moreover, results of modeling studies of neurexin and neuroliginmutations identified in autistic patients, show ER retention and pointto UPR as a mechanism behind synaptic malfunction in autism^(34,39,40).Due to preponderance of highly penetrant mutations, the disruption ofsynaptic cell adhesion molecules is a well-established mechanismunderlying ASD pathophysiologyl⁴, and recent evidence extendsimplications to dysregulation at the network level²⁸. The instantfindings show that genes of the integrin family are abnormally“activated” in ASD, and thus may underlie aberrant synaptic structureand function⁴¹ as well as affect regulation of apoptosis, proliferation,migration and cell differentiation. Integrins also play roles inmodulation of microglia behavior, and thereby additionally participatein regulation of neural inflammation and immune response⁴¹.

Immune gene networks were dysregulated in both ASD study groups and wereamong top networks correlated with brain size in ASD, but not TD,toddlers. Dysregulation of immune/neuroinflammation mechanisms is astrong signal in a large number of studies of older ASD children andadults.^(26,42) The present study, however, is the first to findsignificant dysregulation of immune/neuroinflammation gene networks atabout the age of first clinical risk signs of ASD and the first to showa relationship with ASD brain development. Recently, abnormalimmune/neuroinflammation gene expression in frozen cortex tissue hasbeen reported in two independent studies of young as well as olderpostmortem autism cases.^(2,28) Microglial activation, which typicallyoccurs in association with neuroinflammation, was reported in prefrontalcortex across all ages studied from 2 years to adulthood in ASD.^(43,44)While evidence of immune involvement has been argued to be a secondarylater abnormality in ASD, there is no experimental evidence to favorthat idea over the possibility that ASD involves both prenatal immunealterations as demonstrated by studies modeling prenatal maternal immuneactivation (MIA) in rodents⁴⁵. Abnormal cell cycle control and corticalcell number strongly point to prenatal origins, and whether and how theyand other genetic dysregulation and pathological cellular eventsintersect with immune alterations deserves careful investigation. Ineither event, this study provides the first evidence that immune genenetworks are dysregulated at the age of first clinical concern andreferral at 1 to 2 years of age and already relate to ASD braindevelopment.

This study is unique in that it identified a candidate genomic signaturethat has a high level of accuracy, specificity and sensitivity indiagnostic classification of Discovery ASD vs control (TD and contrast)toddlers all of whom came from a general, naturalistic populationscreening. The strategy, which used the 1-Year Well-Baby Check-UpApproach, allowed the unbiased, prospective recruitment and study of ASDand control toddlers as they occur in the community pediatric clinics,something not previously done by research groups. Thus, not only did theASD toddlers reflect the wide clinical phenotypic range expected incommunity clinics but the control toddlers also reflect the natural mixof typically developing, mild language delayed, transient languagedelayed, and global developmental delayed toddlers commonly seen incommunity clinics. Against this challenging control group, the signatureof this study surprisingly correctly identified 82.5% of Discovery ASDtoddlers. The candidate signature from this discovery sample performedwell in the independent replication cohort, despite the completelydifferent version of microarray chip used with that cohort.

This very good level of accuracy outperforms other behavioral andgenetic screens for ASD infants and toddlers reported in the literature,especially when compared with performance of other tests applied to theyoung general pediatric population (as opposed to preselected syndromicpatients or ASD patients from multiplex families). For example, theM-CHAT, a commonly used parent report screen, has very low specificity(27%)⁴⁶ and positive predictive value (PPV, 11-54%) when used in generalpopulations^(47,48). While important strides have been made inunderstanding possible genetic risk factors in autism³, current DNAtests detect only rare autism cases and lack specificity⁴⁹ or confirmautism at older ages and have not been demonstrated to be effective inASD infants and toddlers²⁶. Thus, the candidate functional genomicsignature reported here, developed from a general pediatric population,is currently the best performing blood- or behavior-based candidateclassifier in ASD infants and toddlers.

The results of this study support the model that in a great majority ofaffected toddlers, ASD involves disruption of a comment set of keyneural developmental genetic pathways. These commonly disrupted pathwaysgovern neuron number and survival, neuronal functional integrity andsynapse formation, which are key neural developmental processes.Disruption of immune genetic networks is also involved in the majorityASD toddlers, an effect not detected in DNA studies of gene mutationsand CNVs, but one that is found in ASD prefrontal brain tissue. Evidenceindicates it is no longer a question of whether immune disruption isinvolved in ASD, but rather why and how. A subset of genes in thesecommon pathways—notably translation, immune/inflammation, cell adhesionand cell cycle genes—provide a candidate genomic signature of risk forautism at young ages. Knowledge of these common pathways can facilitateresearch into biological targets for biotherapeutic intervention anddevelopment of accurate biomarkers for detecting risk for ASD in infantsin the general pediatric population.

REFERENCES

-   1. Courchesne, E. et al. Unusual brain growth patterns in early life    in patients with autistic disorder: an MRI study. Neurology 57,    245-54 (2001).-   2. Redcay, E. & Courchesne, E. When is the brain enlarged in autism?    A meta-analysis of all brain size reports. Biological Psychiatry 58,    1-9 (2005).-   3. Courchesne, E. et al. Mapping early brain development in autism.    Neuron 56, 399-413 (2007).-   4. Stanfield, A. C. et al. Towards a neuroanatomy of autism: a    systematic review and meta-analysis of structural magnetic resonance    imaging studies. Eur Psychiatry 23, 289-99 (2008).-   5. Vaccarino, F. M. & Smith, K. M. Increased brain size in    autism—what it will take to solve a mystery. Biol Psychiatry 66,    313-5 (2009).-   6. Stigler, K. A., McDonald, B. C., Anand, A., Saykin, A. J. &    McDougle, C. J. Structural and functional magnetic resonance imaging    of autism spectrum disorders. Brain Res 1380, 146-61 (2011).-   7. Courchesne, E., Campbell, K. & Solso, S. Brain growth across the    life span in autism: age-specific changes in anatomical pathology.    Brain Res 1380, 138-45 (2011).-   8. Lainhart, J. E. & Lange, N. Increased neuron number and head size    in autism. JAMA 306, 2031-2 (2011).-   9. Courchesne, E. et al. Neuron number and size in prefrontal cortex    of children with autism. JAMA 306, 2001-10 (2011).-   10. Chow, M. L. et al. Age-dependent brain gene expression and copy    number anomalies in autism suggest distinct pathological processes    at young versus mature ages. PLoS Genet 8, e1002592 (2012).-   11. Pinto, D. et al. Functional impact of global rare copy number    variation in autism spectrum disorders. Nature 466, 368-372 (2010).-   12. Zoghbi, H. Y. & Bear, M. F. Synaptic dysfunction in    neurodevelopmental disorders associated with autism and intellectual    disabilities. Cold Spring Harb Perspect Biol 4(2012).-   13. Pierce, K. et al. Detecting, studying, and treating autism    early: the one-year well-baby check-up approach. J Pediatr 159,    458-465 e1-6 (2011).-   14. Luyster, R. et al. The Autism Diagnostic Observation    Schedule-toddler module: a new module of a standardized diagnostic    measure for autism spectrum disorders. J Autism Dev Disord 39,    1305-20 (2009).-   15. Lord, C. et al. The Autism Diagnostic Observation    Schedule—Generic: A Standard Measure of Social and Communication    Deficits Associated with the Spectrum of Autism. in Journal of    autism and developmental disorders Vol. 30 205-223-223 (Springer    Netherlands, 2000).-   16. Mullen, E. M. Mullen Scales of Early Learning, (American    Guidance Service Inc., MN, 1995).-   17. Sparrow, S., Cicchetti D V, Balla D A. Vineland Adaptive    Behavior Scales. Second Edition. Survey Forms Manual. Pearson    Assessments (2005).-   18. Hastie, T. & Tibshirani, R. Generalized additive models for    medical research. Stat Methods Med Res 4, 187-96 (1995).-   19. Sanders, S. J. et al. Multiple Recurrent De Novo CNVs, Including    Duplications of the 7q11.23 Williams Syndrome Region, Are Strongly    Associated with Autism. Neuron 70, 863-85 (2011).-   20. Rossin, E. J. et al. Proteins encoded in genomic regions    associated with immune-mediated disease physically interact and    suggest underlying biology. PLoS Genet 7, e1001273 (2011).-   21. Stranger, B. E. et al. Relative impact of nucleotide and copy    number variation on gene expression phenotypes. Science 315, 848-53    (2007).-   22. Luo, R. et al. Genome-wide transcriptome profiling reveals the    functional impact of rare de novo and recurrent CNVs in autism    spectrum disorders. Am J Hum Genet 91, 38-55 (2012).-   23. Glatt, S. J. et al. Blood-based gene expression signatures of    infants and toddlers with autism. J Am Acad Child Adolesc Psychiatry    51, 934-44 e2 (2012).-   24. Kong, S. W. et al. Characteristics and predictive value of blood    transcriptome signature in males with autism spectrum disorders.    PloS one 7, e49475 (2012).-   25. Chawarska, K. et al. Early generalized overgrowth in boys with    autism. Arch Gen Psychiatry 68, 1021-31 (2011).-   26. Voineagu, I. et al. Transcriptomic analysis of autistic brain    reveals convergent molecular pathology. Nature 474, 380-4 (2011).-   27. Courchesne, E. & Pierce, K. Brain overgrowth in autism during a    critical time in development: implications for frontal pyramidal    neuron and interneuron development and connectivity. Int J Dev    Neurosci 23, 153-70 (2005).-   28. Clement, J. P. et al. Pathogenic SYNGAP1 mutations impair    cognitive development by disrupting maturation of dendritic spine    synapses. Cell 151, 709-23 (2012).-   29. Baudouin, S. J. et al. Shared synaptic pathophysiology in    syndromic and nonsyndromic rodent models of autism. Science 338,    128-32 (2012).-   30. Mitsuhashi, T. & Takahashi, T. Genetic regulation of    proliferation/differentiation characteristics of neural progenitor    cells in the developing neocortex. Brain Dev 31, 553-7 (2009).-   31. Good, M. C., Zalatan, J. G. & Lim, W. A. Scaffold proteins: hubs    for controlling the flow of cellular information. Science 332, 680-6    (2011).-   32. Falivelli, G. et al. Inherited genetic variants in    autism-related CNTNAP2 show perturbed trafficking and ATF6    activation. Hum Mol Genet 21, 4761-73 (2012).-   33. Mendillo, M. L. et al. HSF1 drives a transcriptional program    distinct from heat shock to support highly malignant human cancers.    Cell 150, 549-62 (2012).-   34. Walter, P. & Ron, D. The unfolded protein response: from stress    pathway to homeostatic regulation. Science 334, 1081-6 (2011).-   35. Fujita, E. et al. Autism spectrum disorder is related to    endoplasmic reticulum stress induced by mutations in the synaptic    cell adhesion molecule, CADM1. Cell Death Dis 1, e47 (2010).-   36. Matus, S., Glimcher, L. H. & Hetz, C. Protein folding stress in    neurodegenerative diseases: a glimpse into the ER. Curr Opin Cell    Biol 23, 239-52 (2011).-   37. Zhang, C. et al. A neuroligin-4 missense mutation associated    with autism impairs neuroligin-4 folding and endoplasmic reticulum    export. J Neurosci 29, 10843-54 (2009).-   38. De Jaco, A. et al. Neuroligin trafficking deficiencies arising    from mutations in the alpha/beta-hydrolase fold protein family. J    Biol Chem 285, 28674-82 (2010).-   39. Milner, R. & Campbell, I. L. The integrin family of cell    adhesion molecules has multiple functions within the CNS. J Neurosci    Res 69, 286-91 (2002).-   40. Rossignol, D. A. & Frye, R. E. A review of research trends in    physiological abnormalities in autism spectrum disorders: immune    dysregulation, inflammation, oxidative stress, mitochondrial    dysfunction and environmental toxicant exposures. Mol Psychiatry 17,    389-401 (2012).-   41. Morgan, J. T. et al. Microglial activation and increased    microglial density observed in the dorsolateral prefrontal cortex in    autism. Biological Psychiatry 68, 368-76 (2010).-   42. Vargas, D. L., Nascimbene, C., Krishnan, C., Zimmerman, A. W. &    Pardo, C. A. Neuroglial activation and neuroinflammation in the    brain of patients with autism. Annals of neurology 57, 67-81 (2005).-   43. Oskvig, D. B., Elkahloun, A. G., Johnson, K. R., Phillips, T. M.    & Herkenham, M. Maternal immune activation by LPS selectively alters    specific gene expression profiles of interneuron migration and    oxidative stress in the fetus without triggering a fetal immune    response. Brain Behav Immun 26, 623-34 (2012).-   44. Eaves, L. C., Wingert, H. & Ho, H. H. Screening for autism:    agreement with diagnosis. Autism 10, 229-42 (2006).-   45. Kleinman, J. M. et al. The modified checklist for autism in    toddlers: a follow-up study investigating the early detection of    autism spectrum disorders. J Autism Dev Disord 38, 827-39 (2008).-   46. Chlebowski, C., Robins, D. L., Barton, M. L. & Fein, D.    Large-scale use of the modified checklist for autism in low-risk    toddlers. Pediatrics 131, e1121-7 (2013).-   47. Devlin, B. & Scherer, S. W. Genetic architecture in autism    spectrum disorder. Curr Opin Genet Dev 22, 229-37 (2012).-   48. Roesser, J. Diagnostic yield of genetic testing in children    diagnosed with autism spectrum disorders at a regional referral    center. Clin Pediatr (Phila) 50, 834-43 (2011).

Example 2 Additional Methods, Analyses, and Results SubjectiveRecruitment, Tracking and Developmental Evaluation

All toddlers were developmentally evaluated by a Ph.D. levelpsychologist and those that were younger than 3 years at the time ofblood draw were tracked every 6 months until their 3^(rd) birthday whena final diagnosis was given. Only toddlers with a provisional orconfirmed ASD diagnosis were included in this study. Toddlers wererecruited via the 1-Year Well-Baby Check-Up Approach, a new generalpopulation based screening approach designed to identify toddlers withan ASD around the 1^(st) birthday or from general community sources(e.g., referred by a friend, or response to the website). In brief, the1-Year Well-Baby Check-Up Approach utilizes a broad band screening tool,the CSBS DP IT Checklist) implemented at the routine first yearpediatric exam. The recent study, which included the participation of137 pediatricians who implemented >10,000 CSBS screens, showed that 75%of toddlers that fail the screen at the 1^(st) year exam have a truedelay (either ASD, language delay, global developmental delay or othercondition). While ASD toddlers were as young as 12 months at the time ofblood sampling, all but 3 toddlers have been tracked and diagnosed usingthe ADOS toddler module³ until at least age two years, an age wherediagnosis of ASD is relatively stable⁴⁻⁶. Toddlers received the ADOSmodule that was most appropriate for their age and intellectualcapacity. For the Discovery sample 64% of ASD population had an ADOST,31% had an ADOS 1, and 5% had an ADOS 2 while for the replication sample32% of ASD population had an ADOS T, 48% had an ADOS 1 and 20% had anADOS 2. Only toddlers with a provisional or confirmed ASD diagnosis wereincluded in this study. Twenty-four final diagnoses for participantsolder than 30 months were also confirmed with the Autism DiagnosticInterview—Revised³.

All toddlers participated in a battery of standardized and experimentaltests that included the Autism Diagnostic Observation Schedule³, theMullen Scales of Early Learning′ and the Vineland Adaptive BehaviorScales⁸. Diagnoses were determined via these assessments and theDiagnostic and Statistical Manual, Fourth Edition (DSM IV-TR)⁹. Testingsessions generally lasted 4 hours and occurred across 2 separate daysand the blood sample was usually taken at the end of the first day. Allstandardized assessments were administered by experienced Ph.D. levelpsychologists.

Ethnicity or Race information was self-reported by parents. Discoverysubjects: ASD (87 subjects) were 44 Caucasian, 24 Hispanic, 13 Mixed, 4Asian, 1 Indian, 1 African-American, ethnicity; control (55 subjects)were, 38 Caucasian, 7 Hispanic, 5 mixed, 2 African American, 3 Asianethnicity. Replication subjects: ASD (44 subjects) were 23 Caucasian, 13Hispanic, 6 mixed, 2 Asian ethnicity; control (29 subjects) were 20Caucasian, 4 Hispanic/Latino, 3 mixed, 1 African American ethnicity, 1unreported.

In order to monitor health status, the temperature of each toddler wastaken using an ear digital thermometer immediately preceding the blooddraw. If temperature was higher than 99, then the blood draw wasrescheduled for a different day. Parents were also asked questionsregarding their child's health status such as the presence of a cold orflu, and if any illnesses were present or suspected, the blood draw wasrescheduled for a different day.

RNA Extraction, Preparation and Quality Control

Four-to-six ml of blood was collected into EDTA-coated tubes fromtoddlers on visits when they had no fever, cold, flu, infections orother illnesses or use of medications for illnesses 72 hours priorblood-draw. Blood samples were passed over a LEUKOLOCK filter (Ambion,Austin, Tex., USA) to capture and stabilize leukocytes and immediatelyplaced in a −20° (C.) freezer.

Total RNA was extracted following standard procedures and manufacturer'sinstructions (Ambion, Austin, Tex., USA). In principle, LEUKOLOCK diskswere freed from RNA-later and Tri-reagent was used to flush out thecaptured lymphocyte and lyse the cells. RNA was subsequentlyprecipitated with ethanol and purified though washing andcartridge-based steps. The quality of mRNA samples was quantified by theRNA Integrity Number (RIN) and values of 7.0 or greater were consideredacceptable¹⁰ all processed RNA samples passed RIN quality control.Quantification of RNA was performed using Nanodrop (Thermo Scientific,Wilmington, Del., USA). Samples were prep in 96-well plates at theconcentration of 25 ng/uL.

MRI Scanning and Neuroanatomic Measurement

A T1-weighted IR-FSPGR sagittal protocol (TE=2.8 ms, TR=6.5 ms, flipangle=12 deg, bandwidth=31.25 kHz, FOV=24× cm, slice thickness=1.2 mm,165 images) was collected during natural sleep¹¹.

FSL's linear registration tool (FLIRT) rigidly registered brain imagesto a custom template that was previously registered into MNI space¹².Registered images were then processed through FSL's brain extractiontool (BET) removing skull and non-brain tissue¹³. Remaining non-braintissue was removed by an anatomist to ensure accurate surfacemeasurement. Gray matter, white matter and CSF were segmented via amodified version of the FAST algorithm¹⁴ using partial volumes ratherthan neighboring voxels to increase sensitivity for detecting thin whitematter in the developing brain¹⁵. The brain was divided into cerebralhemispheres, cerebellar hemispheres, and brainstem via AdaptiveDisconnection¹⁶. Each cerebral hemisphere mask was subtracted from asulcal mask generated by BrainVisa and recombined with the original FSLsegmentation to remove all sulcal CSF voxels. The final hemisphere maskwas reconstructed into a smoothed, 3-dimensional mesh in BrainVisa toobtain surface measures¹⁷.

Gene Expression and Data Processing

RNA was assayed at Scripps Genomic Medicine (La Jolla, Calif., USA) forlabeling, hybridization, and scanning using expression BeadChipspipeline (Illumina, San Diego, Calif., USA) per the manufacturer'sinstruction. All arrays were scanned with the Illumina BEADARRAY READERand read into Illumina GENOMESTUDIO software (version 1.1.1). Raw datawas exported from Illumina GENOMESTUDIO and data pre-processing wasperformed using the lumi package¹⁸ for R (R-project.org) andBioconductor (bioconductor.org)¹⁹.

Several quality criteria were used to exclude low quality arrays aspreviously described.^(20,21) In brief, low-quality arrays were thosewith poor signal intensity (raw intensity box plots and averagesignal >2 standard deviations below the mean), deviant pair-wisecorrelation plots, deviant cumulative distribution function plots,deviant multi-dimensional scaling plots, or poor hierarchicalclustering²². Five samples (four ASD and one Control) were identified aslow quality due to poor detection rates, different distributions andcurved dot plots, and were removed prior normalization. Eighteen (18)samples had 1 replicate and all pair-wise plots of each replica had acorrelation coefficient of 0.99. Hierarchical clustering of thesereplicated samples showed 13 samples having with the two replicas thatclustered together, therefore the B array was arbitrarily chosen for thefollowing steps. For the remaining 5 of these replicated samples, thetwo replicas did not cluster together, thus the averaged gene expressionlevels were used in the following steps. No batch effects wereidentified. Raw and normalized data is deposited in Gene ExpressionOmnibus (GSE42133). BrB-array filtering Tool was used to obtain a finalset of genes without missing expression values. Filtering criteria wereLog Intensity Variation (P>0.05) and percent missing (>50% of subjects).142 final samples/arrays (87 ASD, 55 control), and thus 142 uniquesubject datasets, were deemed high quality and entered the expressionanalysis. Inter-array correlation (IAC) was 0.983.

Differentially expressed genes (DE; P<0.05) were obtained by classcomparison (ASD versus control) in BRB-Array Tool using a randomvariance model. The DE genes from the discovery toddlers was then usedto identify differentially expressed pathways (Metacore) and a potentialgene expression signature of ASD. The latter one was then validated onthe replication toddlers. Both discovery and replication datasetsunderwent the same filtering and normalization steps.

WGCNA and Association Analyses

Weighted Gene Correlation Network Analysis (WGCNA) package^(23,24) wasused to identify functional associations between gene modules andneuroanatomic measures across all discovery subjects. Co-expressionanalysis was run by selecting the lowest power for which the scale-freetopology fit index reached 0.90 and by constructing a signed (i.e.,bidirectional) network with a hybrid dynamic branch cutting method toassign individual genes to modules²⁵. Gene Significance (GS; absolutevalue of the correlation between gene expression levels andneuroanatomical measure) and Module Membership (MM; measure ofintramodular connectivity or co-expression across genes within eachbiologically relevant module) were also computed using WGCNA. GS versusMM was computed to provide a measure of gene activity patterns changebetween ASD and control groups (See,labs.genetics.ucla.edu/horvath/CoexpressionNetwork/Rpackages/WGCNA/) formanuals and further details. To identify gene-brain associations withineach study group separately, the WGCNA analyses were also performedwithin ASD and control groups of the discovery sample.

Hypergeometric and Venn Analyses

Hypergeometric distribution analysis was performed using the functionsum(dhyper( )) in R. The total number of human genes from which randomgene-sets of equal size were taken to test the significance of theidentified gene-sets were: 21,405 for the enrichment analyses (thisnumber represents all genes annotated in the Metacore database), 20,151for the Venn analyses involving the DE genes and gene modules (thisnumber represents all genes passing the pre-processing analysis of thediscovery study) and 26,210 for the Venn analysis of the CNVgene-content (this number represents all refseq human genes currentlymapped and present on the Illumina platform HumanHT-12v4). The number ofunique genes within autism relevant CNV below 1 Mb in size was 4611 andwas obtained from the analysis of the AutDB database (see,mindspec.org/autdb.html). Only cases strictly annotated as ASDwith/without additional features (for examples: mental retardation,neurocognitive impairment) were selected. Cases annotated asintellectual disability, developmental delay, language delay Aspergersyndrome, broad spectrum autism, bipolar disorder, learning disabilityeven if associated with autistic features, were not selected. Only CNVsfrom the UCSC build 36 (Human Genome 18) were selected. Venn analysiswas performed using the online tool atpangloss.com/seidel/Protocols/venn.cgi.

Classifier and Performance Analysis

Twelve module eigengenes were obtained from the WGCNA analysis of the2765 DE genes in the discovery sample. Identification of the fourmodules was based on AUC performance after logistic regression in thesame sample. The pair of modules that best performed in distinguishingASD from control subjects was identified. Next, whether adding eachsingle extra module would increase or decrease performance was testedand if performance increased that module was retained. The four modules(blue, black, purple and greenyellow) displayed the best AUC performanceand were used to independently validate the classifier.

To validate the classifier gene-weights were calculated from the genesof the selected modules using their correlation with the eigengenevalues. Weights were applied to the gene expression levels of eachreplication subject and eigengenes were computed and used in thelogistic regression to independently validate the classificationperformance. Clinical and MRI characteristics between the correctlyclassified and misclassified groups (ASD and control) were compared todetermine if the classifier was sensitive to these measures. Results forthe Mullen, ADOS, and Vineland scores were compared. Residual brainvolumes for total brain volume, cerebral white and grey matter, andcerebellar white and grey matter were also compared.

TABLE 26 Pearson and Spearman correlations of module-eigengenes anddiagnosis (Dx) MODULE Dx Top Network Green −0.18*/ns Inflammation_interferon signaling Black 0.24**/0.2{circumflex over ( )}Translation_Translation initiation Magenta  −0.24**/−0.25{circumflexover ( )}{circumflex over ( )} ns Purple  −0.26**/−0.32{circumflex over( )}{circumflex over ( )}{circumflex over ( )} Cell cycle_Meiosis Salmon−0.39***/−0.4{circumflex over ( )}{circumflex over ( )}{circumflex over( )}  ns MidnightBlue  0.18*/0.18{circumflex over ( )} Celladhesion_integrin-mediated LightCyan −0.37***/−0.34{circumflex over( )}{circumflex over ( )}{circumflex over ( )} ns DarkRed  −0.2*/−0.20{circumflex over ( )} ns Signif. codes: p-value Pearson***<0.001; **<0.01; *<0.05; p-value Spearman {circumflex over( )}{circumflex over ( )}{circumflex over ( )}<0.001; {circumflex over( )}{circumflex over ( )}<0.01; {circumflex over ( )}<0.05; ns = notsignificant enrichment

REFERENCES

-   1. Wetherby A M, Allen L, Cleary J, Kublin K, Goldstein H. Validity    and reliability of the communication and symbolic behavior scales    developmental profile with very young children. Journal of speech,    language, and hearing research: JSLHR 2002; 45:1202-18.-   2. Pierce K, Carter C, Weinfeld M, et al. Detecting, studying, and    treating autism early: the one-year well-baby check-up approach. The    Journal of pediatrics 2011; 159:458-65 e1-6.-   3. Luyster R, Gotham K, Guthrie W, et al. The Autism Diagnostic    Observation Schedule-toddler module: a new module of a standardized    diagnostic measure for autism spectrum disorders. Journal of Autism    and Developmental Disorders 2009; 39:1305-20.-   4. Chawarska K, Klin A, Paul R, Macari S, Volkmar F. A prospective    study of toddlers with ASD: short-term diagnostic and cognitive    outcomes. J Child Psychol Psychiatry 2009; 50:1235-45.-   5. Cox A, Klein K, Charman T, et al. Autism spectrum disorders at 20    and 42 months of age: stability of clinical and ADI-R diagnosis. J    Child Psychol Psychiatry 1999; 40:719-32.-   6. Kleinman J M, Ventola P E, Pandey J, et al. Diagnostic stability    in very young children with autism spectrum disorders. J Autism Dev    Disord 2008; 38:606-15.-   7. Mullen E M. Mullen Scales of Early Learning. AGS ed. MN: American    Guidance Service Inc.; 1995.-   8. Sparrow S, Cicchetti D V, Balla D A. Vineland Adaptive Behavior    Scales. Second Edition. Survey Forms Manual. Pearson Assessments    2005.-   9. Association AP. Diagnostic and Statistical Manual of Mental    Disorders. Fourth Edition. American Psychiatric Association 2000.-   10. Schroeder A, Mueller 0, Stocker S, et al. The RIN: an RNA    integrity number for assigning integrity values to RNA measurements.    BMC Mol Biol 2006; 7:3.

11. Eyler L T, Pierce K, Courchesne E. A failure of left temporal cortexto specialize for language is an early emerging and fundamental propertyof autism. Brain 2012; 135:949-60.

-   12. Jenkinson M, Smith S. A global optimisation method for robust    affine registration of brain images. Medical image analysis 2001;    5:143-56.-   13. Smith S M. Fast robust automated brain extraction. Hum Brain    Mapp 2002; 17:143-55.-   14. Zhang Y, Brady M, Smith S. Segmentation of brain MR images    through a hidden

Markov random field model and the expectation-maximization algorithm.IEEE transactions on medical imaging 2001; 20:45-57.

-   15. Altaye M, Holland S K, Wilke M, Gaser C. Infant brain    probability templates for MRI segmentation and normalization.    NeuroImage 2008; 43:721-30.-   16. Zhao L, Ruotsalainen U, Hirvonen J, Hietala J, Tohka J.    Automatic cerebral and cerebellar hemisphere segmentation in 3D MRI:    adaptive disconnection algorithm. Medical image analysis 2010;    14:360-72.-   17. Rivière D G D, Denghien I, Souedet N, Cointepas Y. BrainVISA: an    extensible software environment for sharing multimodal neuroimaging    data and processing tools. NeuroImage 2009; 47:S163.-   18. Du P, Kibbe W A, Lin S M. lumi: a pipeline for processing    Illumina microarray. Bioinformatics (Oxford, England) 2008;    24:1547-8.-   19. Gentleman R C, Carey V J, Bates D M, et al. Bioconductor: open    software development for computational biology and bioinformatics.    Genome Biology 2004; 5:R80.-   20. Chow M L, Li H R, Winn M E, et al. Genome-wide expression assay    comparison across frozen and fixed postmortem brain tissue samples.    BMC genomics 2011; 12:449.-   21. Chow M L, Pramparo T, Winn M E, et al. Age-dependent brain gene    expression and copy number anomalies in autism suggest distinct    pathological processes at young versus mature ages. PLoS Genet 2012;    8:e1002592.-   22. Oldham M C, Konopka G, Iwamoto K, et al. Functional organization    of the transcriptome in human brain. Nature Neuroscience 2008;    11:1271-82.-   23. Langfelder P, Horvath S. WGCNA: an R package for weighted    correlation network analysis. BMC Bioinformatics 2008; 9:559.-   24. Langfelder P, Horvath S. Eigengene networks for studying the    relationships between co-expression modules. BMC systems biology    2007; 1:54.-   25. Pramparo T, Libiger O, Jain S, et al. Global developmental gene    expression and pathway analysis of normal brain development and    mouse models of human neuronal migration defects. PLoS Genet 2011;    7:e1001331.

Example 3 Age-Related Changes in Gene Expression in ASD and Non-ASDControls

Age-related changes in ASD signature genes from infancy to youngchildhood were analyzed and compared to non-ASD controls. We discoveredseveral patterns of age-dependent expression changes across ASDsignature genes, including but not limited to the following threeexamples: First, genes were identified that showed main effects ofdiagnosis (ASD vs Control) and no statistically significant age-relatedchanges (FIG. 13A; ASD—light grey vs Control—dark grey). For these genes(which are in the minority of all ASD signature genes), absoluteexpression level predicted diagnostic classification regardless of ageat testing. Second, other genes were identified that showed main effectsof diagnosis plus main effects of age (FIG. 13B); these represented alarge portion of all ASD signature genes. Thus, for these genesknowledge of absolute expression level could give erroneousclassification unless age at testing was taken into account. Third,still other signature genes were identified that showed an interactionbetween age and diagnosis (FIG. 13C) such that at some ages expressionlevels were greater in ASD than control, while at other ages expressionlevels did not significantly differ between ASD and control and at stillother ages expression levels in controls exceeded ASD. A large portionof signature fell into this category of age-related change in geneexpression level in ASD and controls. The age at which ASD and controlexpression change trajectories intersected varied across genes with someintersecting at early ages, others at 2-3 years and others after 2 to 3years of age. For these genes, knowledge of absolute expression levelwill give completely erroneous classification unless age at testing iscomputationally taken into account.

Knowledge we developed of these age-dependent changes in the expressionlevels of each and every signature gene is incorporated into the WGERDand is computationally combined with the weighted-gene expression valuesso that, with age changes as a predictor for each gene, we haveoptimized age-specific signatures of ASD. Given the child's age at thetime of bioassay and the expression levels of each gene, the programcalculates age-adjusted weight-gene expression values for the child tocompare to the WGERD age-adjusted weight gene expression signature.Using different numbers of signature genes (ie, 10, 20, 40, 80, 160,etc) age-adjusted expression signatures out-perform expressionsignatures without any age correction by 4% to 10%. See FIG. 14 for oneexample of the invention's performance enhancement when knowledge of ageeffects is combined with gene expression (FIG. 14 a) versus whenage-adjusted calculations per gene are not used (FIG. 14 b) as well asTable 3 above.

Example 4 Weighted Gene Expression Values in Combination with GeoPrefTest Score

The magnitude of the problem articulated above in the Background sectionis substantial and immediate: Given the current prevalence rates, everyyear 52,000 and 84,000 babies born will go on to develop ASD. Therefore,there is an immediate need for feasible, practical, cost-effective andclinical-effective biological ASD tests that reduce the age of accurateand specific detection, evaluation, and referral to as young an age aspossible in real world community settings. Procedures that have poor ASDspecificity worsen the problem. Procedures that lack sensitivity leave ahuge number of babies under-detect and un-diagnosed, which also fails toaddress the magnitude of the problem. Tests that are expensive, such aswhole genome sequencing fail to address the problem because they are soexpensive.

In brief, prior methods have not delivered screening, detection anddiagnostic evaluation approaches that are easy, quick, andcost-effective to implement in ordinary community settings anywhere andby staff ordinarily present in the clinics. Missing from these methodsis high ASD-specificity and very good sensitivity so that a largeportion of all true cases of babies, 1 to 2 year olds, 2 to 3 year olds,and 3 to 4 year olds with ASD are detected and correctly diagnosed and aminimum percentage of non-ASD babies are not falsely misdiagnosed asASD.

The methods of the invention provide the first procedure with asurprisingly high level of specificity and very good sensitivity in aneasy, quick and cost-effective way. In some embodiments, the inventiondoes this by using a novel method that combines gene expression asdescribed above and GeoPref test data and signatures in the MMSM.

The GeoPref Test is fully described in Pierce et al. (2011). In brief,the GeoPref Test is a simple and quick 1-minute eye-tracking test thatcan be administered as a screen or evaluation test to individuals in thegeneral pediatric population. Babies, infants, toddlers and youngchildren are shown a computer screen that displays colorful movingpatterns on one side (the “Geo” side) and lively moving children on theother (the “Social” side). Eye-tracking and scoring of how much time achild looks at one side or the other is automated. A child that looks atthe “Geo” side by more than a threshold amount of time during a 1-minutetest is considered a Geo preference (or “GeoPref”) responder. GeoPrefresponders among babies, infants, toddlers and young children have a 99%chance of being ASD but only 20 to 30% of all ASD cases are detected bythis test.

By computationally combining the weighted gene expression values andGeoPref score of a child, a gene expression-GeoPref signature of thechild is obtained, and comparing it to the MMSM reference databasecompute a score for that child's ASD risk is computed based ondivergence of the child's GeoPref MMSM signature to the GeoPref MMSMreference database. In one embodiment of this procedure, accuracyremains at 85% and sensitivity drops slightly to 72%, butASD-specificity is a 98%. This is the highest overall performance of anyprevious biological or biobehavioral ASD test applied at any age frombirth to 4 years. Importantly, this combined WGSM/MMSM signature iscapable of very high beneficial impact in screening and diagnosticevaluation because it not only detects a very large portion of thegeneral pediatric ASD population at young ages via a simple, quick 1minute test plus ordinary blood draw to get a gene expression bioassay,but it has an extremely high correct detection rate and a very low falsepositive rate. Thus, it addresses in a very meaningful way the need forearly and correct detection and diagnostic determination of ASD amongthe 52,000 to 84,000 babies born every year in the US who do developASD.

REFERENCES

-   1. Pierce, K., Conant, D., Hazin, R., Stoner, R. & Desmond, J.    Preference for geometric patterns early in life as a risk factor for    autism. Archives of General Psychiatry 68, 101-9 (2011).

Example 5 Weighted Gene Expression Values in Combination with ProteinSignatures of ASD

ASD and other diseases are manifested by changes in gene expression,metabolite profiles and in the expression, post-translational processingand protein and small molecule interactions among the cellular andnon-cellular constituents of blood and other tissues. There is widevariation in the correlation between gene expression and the level ofany particular protein or modified variant thereof. These variations inthe levels of particular proteins and protein variants have been foundto correlate with disease and disease progression in numerous examples.Additionally, the poor correlation between gene expression and patternsof relative abundance of protein variants suggests that production ofprotein variants is subject to different aspects of disease biology thanis gene expression, and further suggests that measurement of patterns ofprotein variants in blood and other tissues could be a valuable adjunctassessment of disease in combination with weighted gene expression.

Therefore, in certain embodiments, MMSM includes, but is not limited to,assays of proteins in peripheral blood. As with RNA tests, only a subsetof blood proteins are likely to change in ways that allow theirmeasurement to be informative for diagnosing autism. Simple changes inthe abundance of certain proteins may be correlated with ASD, andmeasurement of the concentration of one or more of these proteins eitherdirectly in blood or extracted from blood can have diagnostic value.Useful measurement techniques span a range of specificity and technicalapproaches. Highly specific measurement of proteins derived fromspecific unique genes can use antibody reagents to specifically quantifyparticular protein species. The same approach can be extended to analyzelarge numbers of different proteins using collections of antibodiestargeting the detection of multiple different protein species to enablethe measurement of the abundance of larger groups of proteins in blood.For diagnostic assays, each of the antibodies would be chosen torecognize species of proteins that vary in abundance or protein qualityas a function of ASD status, and this relationship to ASD would beestablished by experiment. Analogous to the weighted gene signaturesused in the development of our diagnostic RNA signature, measurement ofthe weighted expression signature of multiple proteins can also be usedto combine the ASD related changes in these proteins into a molecularfingerprint of ASD. An extension of this approach to use simpleabundance measurements as a weighted diagnostic signature is to find anduse ASD-associated changes in other protein properties to use asdiagnostic molecular signatures. In addition to abundance measurement,these other informative changes include changes in proteinpost-translational modification, protein three dimensional conformation,complex formation with other serum components (other protein ornon-protein components of blood) and changes in the ability to interactwith ligands (e.g. protein or small molecules that can bind the proteinschanged by ASD). Assays to discover these ASD changes in proteinabundance or properties can also be incorporated directly or indirectlyinto diagnostic assays.

Protein signatures of ASD can be discovered by a large number ofcombinations of fractionation and analysis techniques. Whole bloodproteins may be directly analyzed (for example using ForteBio Octet orother immunodetection systems), or the cellular and non-cellularfractions can be separated and separately analyzed with variable levelsof fractionation of both cellular and plasma fractions. In general,analysis of proteins within a fraction becomes easier as the fraction isreduced in complexity by fractionation, but some analytical techniquescan work directly on unfractionated or less fractionated samples. Thereis a long history of development of new protein extraction andfractionation techniques applied to research and commercialfractionation, purification and analysis of proteins to answer researchquestions or produce protein products. In general, proteins can befractionated by solubility (e.g. by ammonium sulfate fractionalprecipitation, or by partitioning between solvents of differingcomposition), by selecting for particular binding affinity forfunctionalized surfaces (e.g. selecting for protein fractions withdiffering affinity for ion exchange or reverse phase matrices in HPLC,or for other more specific affinity reagents such as antibodies coupledto solid phase substrates or small molecule derivatized surfaces) or byselecting for specific migration characteristics in sieving matrices(e.g. size exclusion chromatography or electrophoresis). The affinityreagents used to capture and quantify specific protein species can begeneral (binding to all variants of a protein product of a particulargene), or the reagents could be specific for particular variants derivedby post-translational processing, conformational change or liganding(e.g. antibodies specific for post-translationally modified forms of aprotein). Once separated, the proteins can be analyzed by a number oftechniques to identify and quantify particular proteins. Those skilledin the art would use mass spectrometry to define the genetic identityand quantity of intact or fragmented proteins within a mixture, or woulduse antibody or other specific affinity reagents to quantify theseproteins.

As an example, we explored for protein biomarkers of ASD by doingimmunoassays for the following 9 biomarkers: TNF-α, IL-6, IL-10, IP-10,sIL-6R, sFas, VEGF, sVEGFR-1 and tPAI-1 in serum samples derived fromthe following collection of 142 pediatric patients presenting forclinical assessment of ASD status.

Language All Typical Delayed (LD) ASD 142 66 27 49

The results of this analyses suggested that abnormalities in levels ofsFas (elevated) and, VEGF, sIL-6R, and IL-6 (all reduced) aresignificantly associated with ASD relative to TD patients. Thisdemonstrates that there are multiple protein biomarkers of ASD, andintegration of measurements of these protein changes into combinationtests for ASD (e.g. combining weighted gene expression signatures,behavioral tests and measurements of blood protein composition) isexpected to enhance the overall test performance. Extending thisdiscovery approach to larger and more complex patient sets and to theuse of additional combinations of fractionation, detection and proteinidentification will expand this list of diagnostically relevant proteinchanges, and choosing which tests to incorporate into combined assays isdetermined by prospective clinical trials as with the initial discoveryof the weighted gene expression signatures. These results are a proof ofprinciple demonstration that serum expression levels of proteins andprotein variants can change as a function of ASD status, and thatmeasurement of these levels can therefore be used as additionaldiagnostic assays in conjunction with WGSM in MMSM.

Other embodiments and uses are apparent to one skilled in the art inlight of the present disclosures. Those skilled in the art willappreciate that numerous changes and modifications can be made to theembodiments of the invention and that such changes and modifications canbe made without departing from the spirit of the invention. It is,therefore, intended that the appended claims cover all such equivalentvariations as fall within the true spirit and scope of the invention.

We claim:
 1. A method of conducting a weighted gene and feature test ofautism (WGFTA) for autism screening, diagnosis or prognosis, comprising:a) obtaining an analyte from a biological sample of a subject to obtainanalyte-associated gene expression levels of a set of at least 20 ormore genes selected from a model derived from an autism referencedatabase; b) statistically normalizing each expression level of theselected set of genes expressed to derive a normalized gene expressionvalue (NGEV) for each gene in the selected set; c) preparing a weightedgene signature matrix (WGSM) of the selected gene set from the autismreference database; d) calculating a weighted gene expression level ofeach gene in the selected set by multiplying the NGEV for each gene by agene-specific weight of that gene, wherein the gene-specific weights arederived from a computer-based bioinformatic analysis of the relativeexpression levels of at least the selected set of genes compiled in aweighted gene expression reference database (WGERD); and e) establishingthe divergence of the set of each weighted gene expression level of thesubject to the WGERD, to thereby conduct the WGFTA to indicateincreasing correlation with autism risk, diagnosis or prognosis.
 2. Themethod of claim 1, wherein the genes are selected and arranged into 4sets of at least 10 genes each as shown in Table
 1. 3. The method ofclaim 1, wherein the preparing the WGSM comprises expression levels of aselected set of 50 or more genes expressed in the biological sample ofthe subject selected from the genes listed in Table
 1. 4. The method ofclaim 1, wherein the model derived from the autism referenced databasecomprises at least the selected set of genes from more than 40 healthyindividuals and 40 autistic individuals compiled in the WGERD.
 5. Themethod of claim 1, wherein the genes are selected from genes in sets 1through 4 with absolute value of weights ranging from about 0.50 toabout 1.00 as shown in Tables 1.1 through 1.4.
 6. The method of claim 1,wherein said genes are selected from genes listed in Tables 2 and 16through
 25. 7. The method of claim 1, wherein said genes are selectedfrom genes involved in cell cycle, protein folding, cell adhesion,translation, DNA damage response, apoptosis, immune/inflammationfunctions, signal transduction ESR1-nuclear pathway, transcription-mRNAprocessing, cell cycle meiosis, cell cycle G2-M, cell cycle mitosis,cytoskeleton-spindle microtubule and cytoskeleton-cytoplasmicmicrotubule functions.
 8. The method of claim 1, further comprisingcomparing a gene-network signature matrix (GNSM) of the subject to aGNSM reference database, to conduct the WGFTA for autism risk screening,diagnosis or prognosis based on the divergence of the subject's GNSM tothe GNSM reference database, wherein said GNSM comprises interactionpatterns of specific gene-weights and features calculated fromgene-to-gene interactions, wherein said interaction patterns arecalculated based on the relationship or state of a gene with non-genomicfeatures.
 9. The method of claim 1, further comprising comparing amulti-modal signature matrix (MMSM) of the subject to a MMSM referencedatabase, to conduct the WGFTA for autism risk screening, diagnosis orprognosis based on the divergence of the subject's MMSM to the MMSMreference database, wherein said MMSM is a matrix containing thequantification of non-genomic features obtained by clinical, behavioral,anatomical, and functional measurements.
 10. The method of claim 9,wherein said non-genomic features comprise age, a GeoPreference test, aMRI/fMRI/DT1 test, an ADOS test, or a CSBS test.
 11. The method of claim10, wherein said non-genomic feature is age.
 12. The method of claim 1,further comprising comparing a collateral feature signature matrix(CFSM) of the subject to a CFSM reference database, to conduct the WGFTAfor autism risk screening, diagnosis or prognosis based on thedivergence of the subject's CFSM to the CFSM reference database, whereinsaid CFSM comprises features collateral to the subject, wherein saidcollateral features comprise analytes in maternal blood duringpregnancy, sibling with autism, maternal genomic signature orpreconditions, or adverse pre- or perinatal events.
 13. A method forconducting a weighted gene feature test of autism (WGFTA) for autismscreening, diagnosis or prognosis, comprising: a) obtaining a biologicalsample containing analytes of interest from cells contained in thesample; b) preparing a weighted gene signature matrix (WGSM) comprisingexpression levels of a selected set of two or more analyte-associatedgenes selected from the genes listed in Tables 1 through 2 and 16through 25; c) calculating a weighted gene expression level of each genein the selected set by multiplying a normalized gene expression value(NGEV) of the WGSM by a gene-specific weight of that gene derived from acomputer-based bioinformatics processing of the relative expressionlevels of at least the selected set of genes from more than 40 healthyindividuals and 40 autistic individuals compiled in a weighted geneexpression reference database (WGERD); and d) establishing thedivergence of the set of each weighted gene expression level of thesubject to the WGERD, to thereby conduct the WGFTA to risk.
 14. A methodof conducting a weighted gene and feature test of autism (WGFTA) forautism screening, diagnosis or prognosis of a subject in need,comprising determining and normalizing gene expression values of a setof at least about 20 or more age-weighted signature genes as shown inTable
 2. 15. The method of claim 14, wherein said genes are selectedfrom the group consisting of gene IGF2R, ARAP3, FCGR3A, LOC389342,LOC648863, SPI1, LOC642567, CUTL1, PDE5A, ASAP1, KIAA0247, MAP1LC3A,ZNF185, IRS2, MTMR3, LOC100132510, IMPA2, NCALD, and MPL.
 16. The methodof claim 14, wherein said method comprising determining and normalizinggene expression values of a set of at least about 25 or more signaturegenes as shown in Table
 2. 17. The method of claim 14, wherein saidgenes are selected from the group consisting of gene FCGR3A, LOC389342,IGF2R, ARAP3, PDE5A, MPL, CUTL1, LOC642567, SDPR, PTGS1, MIR1974,MAP1LC3A, LILRA3, LOC100133875, SPI1, LOC653737, IRS2, MAST3, NCF1B,STK40, KIAA0247, LOC648863, CTDSPL, and NCALD.
 18. The method of claim17, wherein said method provides at least 75% accuracy.
 19. The methodof claim 14, wherein said genes are selected from the group consistingof gene AK3, LOC100132510, ARID4A, CMTM4, KIAA1430, LOC441013, MAL,SETD1B, AKR1C3, ATXN7L3B, PARP15, AP2S1, CA2, PAN3, MTMR3, TOP1P2,UHRF2, LOC92755, EPOR, MED31, LOC389286, LOC646836, MSRB3, GPR65, SMPD1,GPX4, LOC100133770, PRKCB, and LOC100129424.
 20. The method of claim 19,wherein said method provides at least 80% accuracy.